I am using Matlab for this (preferable idea).
I need to multiply a frequency of a histogram by a scalar value (for each bin).
I have tried this approach in a similar question but it is defined for hist and not histogram function.
This is my original distribution that needs to be multiplied:
This is what I get using the approach given in the similar question:
Additionally, when I finish this part I will have more histograms that I need to sum up into one histogram. So how would I do that? They might have different ranges.
The documentation clearly explains how to replicate the behavior of hist with histogram.
For example:
A = rand(100, 1);
h = histogram(A);
figure
h_new = histogram('BinCounts', h.Values*2, 'BinEdges', h.BinEdges);
Generates the following histograms:
You can modify the Bincounts like this:
X = normrnd(0,1,1000,1); % some data
h = histogram(X,3); % histogram with 3 bins
h.BinCounts = h.Values.*[3 5 1]; % scale each bin by factor 3, 5 and 1 respectively
Related
I wish to normalize my histogram, but for some reason I get some error in my code.
N = 1000;
mu = 5; stdev = 2;
x = mu+stdev*randn(N,1);
bin=mu-6*stdev:0.5:mu+6*stdev;
f=hist(x,bin);
plot(bin,f,'bo');
counts = f.Values;
sum_counts = sum(counts);
width = f.BinWidth;
area = sum_counts*width;
I get to plot my histogram but I get an error in normalization. I know that the histogram() function supports normalization but I am trying to avoid that.
Dot indexing is not supported for variables of this type.
counts = f.Values;
when you write f=hist(x,bin); you assign the values of the histogram to f as a vector, as you saw. normalization such that the area under the curve is 1, is then just f./sum(f) ...
Note that hist is no longer recommended and has been replaced by histogram.
There are normalisation options as name-value pairs when creating the histogram.
histogram(x,bin,'Normalization','pdf'); or histogram(x,bin,'Normalization','probability');, for example, may be what you are looking for. The full range of normalisation options can be found in the doc.
I am trying to fit a Poisson function to a histogram in Matlab: the example calls for using hist() (which is deprecated) so I want to use histogram() instead, especially as you cannot seem to normalize a hist(). I then want to apply a poisson function to it using poisspdf() or any other standard function (preferably no toolboxes!). The histogram is probability scaled, which is where the issue with the poisson function comes from AFAIK.
clear
clc
lambda = 5;
range = 1000;
rangeVec = 1:range;
randomData = poissrnd(lambda, 1, range);
histoFigure = histogram(randomData, 'Normalization', 'probability');
hold on
poissonFunction = poisspdf(randomData, lambda);
poissonFunction2 = poisspdf(histoFigure, lambda);
plot(poissonFunction)
plot(poissonFunction2)
I have tried multiple different methods of creating the poisson function + plotting and neither of them seems to work: the values within this function are not consistent with the histogram values as they differ by several decimals.
This is what the image should look like
however currently I can only get the bar graphs to show up correctly.
You're not specifing the x-data of you're curve. Then the sample number is used and since you have 1000 samples, you get the ugly plot. The x-data that you use is randomData. Using
plot(randomData, poissonFunction)
will lead to lines between different samples, because the samples follow each other randomly. To take each sample only once, you can use unique. It is important that the x and y values stay connected to each other, so it's best to put randomData and poissonFunction in 1 matrix, and then use unique:
d = [randomData;poissonFunction].'; % make 1000x2 matrix to find unique among rows
d = unique(d,'rows');
You can use d to plot the data.
Full code:
clear
clc
lambda = 5;
range = 1000;
rangeVec = 1:range;
randomData = poissrnd(lambda, 1, range);
histoFigure = histogram(randomData, 'Normalization', 'probability');
hold on
poissonFunction = poisspdf(randomData, lambda);
d = [randomData; poissonFunction].';
d = unique(d, 'rows');
plot(d(:,1), d(:,2))
With as result:
I need to compute a moving average over a data series, within a for loop. I have to get the moving average over N=9 days. The array I'm computing in is 4 series of 365 values (M), which itself are mean values of another set of data. I want to plot the mean values of my data with the moving average in one plot.
I googled a bit about moving averages and the "conv" command and found something which i tried implementing in my code.:
hold on
for ii=1:4;
M=mean(C{ii},2)
wts = [1/24;repmat(1/12,11,1);1/24];
Ms=conv(M,wts,'valid')
plot(M)
plot(Ms,'r')
end
hold off
So basically, I compute my mean and plot it with a (wrong) moving average. I picked the "wts" value right off the mathworks site, so that is incorrect. (source: http://www.mathworks.nl/help/econ/moving-average-trend-estimation.html) My problem though, is that I do not understand what this "wts" is. Could anyone explain? If it has something to do with the weights of the values: that is invalid in this case. All values are weighted the same.
And if I am doing this entirely wrong, could I get some help with it?
My sincerest thanks.
There are two more alternatives:
1) filter
From the doc:
You can use filter to find a running average without using a for loop.
This example finds the running average of a 16-element vector, using a
window size of 5.
data = [1:0.2:4]'; %'
windowSize = 5;
filter(ones(1,windowSize)/windowSize,1,data)
2) smooth as part of the Curve Fitting Toolbox (which is available in most cases)
From the doc:
yy = smooth(y) smooths the data in the column vector y using a moving
average filter. Results are returned in the column vector yy. The
default span for the moving average is 5.
%// Create noisy data with outliers:
x = 15*rand(150,1);
y = sin(x) + 0.5*(rand(size(x))-0.5);
y(ceil(length(x)*rand(2,1))) = 3;
%// Smooth the data using the loess and rloess methods with a span of 10%:
yy1 = smooth(x,y,0.1,'loess');
yy2 = smooth(x,y,0.1,'rloess');
In 2016 MATLAB added the movmean function that calculates a moving average:
N = 9;
M_moving_average = movmean(M,N)
Using conv is an excellent way to implement a moving average. In the code you are using, wts is how much you are weighing each value (as you guessed). the sum of that vector should always be equal to one. If you wish to weight each value evenly and do a size N moving filter then you would want to do
N = 7;
wts = ones(N,1)/N;
sum(wts) % result = 1
Using the 'valid' argument in conv will result in having fewer values in Ms than you have in M. Use 'same' if you don't mind the effects of zero padding. If you have the signal processing toolbox you can use cconv if you want to try a circular moving average. Something like
N = 7;
wts = ones(N,1)/N;
cconv(x,wts,N);
should work.
You should read the conv and cconv documentation for more information if you haven't already.
I would use this:
% does moving average on signal x, window size is w
function y = movingAverage(x, w)
k = ones(1, w) / w
y = conv(x, k, 'same');
end
ripped straight from here.
To comment on your current implementation. wts is the weighting vector, which from the Mathworks, is a 13 point average, with special attention on the first and last point of weightings half of the rest.
I have 2 columns x, y of 100 points each. I would like to remove the outliers data and refill their gap with the average value of the points near to them. Firstly, can I do that? is any Matlab function? Secondly, if yes, what is the best technique to make that?
E.g:
x = 1:1:100
y = rand(1,99)
y(end+1)=2
In this case, not so similar to my problem, I would like to remove value 2 at the end and to be replaced with one similar to their neighbor points. In my case the distribution of the [x,y] is a non linear function, having few outliers.
It depends on what you mean by outlier. If you assume that outliers are more than three standard deviations from the median, for example, you could do this
all_idx = 1:length(x)
outlier_idx = abs(x - median(x)) > 3*std(x) | abs(y - median(y)) > 3*std(y) % Find outlier idx
x(outlier_idx) = interp1(all_idx(~outlier_idx), x(~outlier_idx), all_idx(outlier_idx)) % Linearly interpolate over outlier idx for x
y(outlier_idx) = interp1(all_idx(~outlier_idx), y(~outlier_idx), all_idx(outlier_idx)) % Do the same thing for y
This code will just remove the outliers and linearly interpolate over their positions using the closest values that are not outliers.
I have 100 sampled numbers, and I need to draw the normal distribution curve of them in matlab.
The mean and standard deviation of these sampled data can be calculated easily, but is there any function that plots the normal distribution?
If you have access to Statistics Toolbox, the function histfit does what I think you need:
>> x = randn(10000,1);
>> histfit(x)
Just like with the hist command, you can also specify the number of bins, and you can also specify which distribution is used (by default, it's a normal distribution).
If you don't have Statistics Toolbox, you can reproduce a similar effect using a combination of the answers from #Gunther and #learnvst.
Use hist:
hist(data)
It draws a histogram plot of your data:
You can also specify the number of bins to draw, eg:
hist(data,5)
If you only want to draw the resulting pdf, create it yourself using:
mu=mean(data);
sg=std(data);
x=linspace(mu-4*sg,mu+4*sg,200);
pdfx=1/sqrt(2*pi)/sg*exp(-(x-mu).^2/(2*sg^2));
plot(x,pdfx);
You probably can overlay this on the previous hist plot (I think you need to scale things first however, the pdf is in the range 0-1, and the histogram is in the range: number of elements per bin).
If you want to draw a Gaussian distribution for your data, you can use the following code, replacing mean and standard deviation values with those calculated from your data set.
STD = 1;
MEAN = 2;
x = -4:0.1:4;
f = ( 1/(STD*sqrt(2*pi)) ) * exp(-0.5*((x-MEAN)/STD).^2 );
hold on; plot (x,f);
The array x in this example is the xaxis of your distribution, so change that to whatever range and sampling density you have.
If you want to draw your Gaussian fit over your data without the aid of the signal processing toolbox, the following code will draw such a plot with correct scaling. Just replace y with your own data.
y = randn(1000,1) + 2;
x = -4:0.1:6;
n = hist(y,x);
bar (x,n);
MEAN = mean(y);
STD = sqrt(mean((y - MEAN).^2));
f = ( 1/(STD*sqrt(2*pi)) ) * exp(-0.5*((x-MEAN)/STD).^2 );
f = f*sum(n)/sum(f);
hold on; plot (x,f, 'r', 'LineWidth', 2);