Problem with processing function input variable in Matlab - matlab

So I am currently working with different datasets. Some are monthly, some daily, but I want quarterly. This is why I wrote the following function:
function y = average2(data, frequency)
% Monthly/Daily data to quarterly data by taking average
% INPUT data Nx2 monthly/daily data
% OUTPUT y Mx2 quarterly data
% USAGE average2(data)
if frequency == 'monthly';
K = 1:3:(length(data)-3);
quarterly = (data(K, 2)+data(K+1, 2)+data(K+2, 2))/3;
timevector = data(K, 1);
y = [timevector quarterly];
elseif frequency == 'daily';
y = data*data; %just as an example, not correct calculation
else frequency ~= 'daily' || 'monthly';
error('Requested frequency not available');
end
(the calculation of daily is not the problem). So my Problem is the following: If I use the monthly option, everything works fine. But everytime I use something different than 'monthly' as frequency in my function, I get the error message:
Matrix dimensions must agree.
Error in average2 (line 8)
if frequency == 'monthly';
Therefore activating the elseif clause and processing the input I get in frequency doesn't work. Does anyone know where I have a mistake? Thanks in advance

To compare strings, use the strcmp (case sensitive) or the strcmpi (case insensitive) functions.
if(strcmp(frequency,'monthly'))
K = 1:3:(length(data)-3);
quarterly = (data(K, 2)+data(K+1, 2)+data(K+2, 2))/3;
timevector = data(K, 1);
y = [timevector quarterly];
elseif(strcmp(frequency,'monthly'))
y = data*data; %just as an example, not correct calculation
else % frequency ~= 'daily' || 'monthly' % don't have to do this comparison and is not correctly coded
error('Requested frequency not available');
end

Related

How to compute confidence intervals and plot them on a bar plot

How can I plot a bar out of a
data = 1x10 cell
, where each value in the cell has a different dimension like 3x100, 3x40, 66x2 etc.
My goal is to get a bar plot, where I would have 10 group of bars and in every group three bars for each of the values. On the bar, I want it to be shown the median of the values, and I want to calculate the confidence interval and show it additionally.
On this example there are not group of bars, but my point is to show you how I want the confidence intervals shown. On the site, where I found this example they offer a solution where they have this command line
e1 = errorbar(mean(data), ci95);
but I have the problem that it can't find any ci95
So, are there any other effective ways to do it, without installing or downloading additional services?
I've found Patrick Happel's answer to not work because the figure window (and therefore the variable b) gets cleared out by subsequent calls to errorbar. Simply adding a hold on command takes care of this. To avoid confusion, here's a new answer that reproduces all of Patrick's original code, plus my small tweak:
%% Old answer
%Just to be safe, let's clear everything
clear all
data = cell(1,10);
% Random length of the data
l = randi(500, 10, 1) + 50;
% Random "width" of the data, with 3 more likely
w = randi(4, 10, 1);
w(w==4) = 3;
% random "direction" of the data
d = randi(2, 10, 1);
% sigma of the data (in fraction of mean)
sigma = rand(10,1) / 3;
% means of the data
dmean = randi(150,10,1);
dsigma = dmean.*sigma;
for c = 1 : 10
if d(c) == 1
data{c} = randn(l(c), w(c)) .* dsigma(c) + dmean(c);
else
data{c} = randn(w(c), l(c)) .* dsigma(c) + dmean(c);
end
end
%============================================
%Next thing is
% On the bar, I want it to be shown the median of the values, and I
% want to calculate the confidence interval and show it additionally.
%
%Are you really sure you want to plot the median? The median of some data
%is not connected to the variance of the data, and hus no type of error
%bars are required. I guess you want to show the mean. If you really want
%to show the median, a box plot might be a better alternative.
%
%The following code computes and plots the mean in a bar plot:
%============================================
means = zeros(numel(data),3);
stds = zeros(numel(data),3);
n = zeros(numel(data),3);
for c = 1:numel(data)
d = data{c};
if size(d,1) < size(d,2)
d = d';
end
cols = size(d,2);
means(c, 1:cols) = nanmean(d);
stds(c, 1:cols) = nanstd(d);
n(c, 1:cols) = sum(~isnan((d)));
end
b = bar(means);
%% New code
%This ensures that b continues to reference existing data in the next for
%loop, as the graphics objects can otherwise be deleted.
hold on
%% Continuing Patrick Happel's answer
%============================================
%Now, we need to compute the length of the error bars. Typical choices are
%the standard deviation of the data (already computed by the code above,
%stored in stds), the standard error or the 95% confidence interval (which
%is the 1.96fold of the standard error, assuming the underlying data
%follows a normal distribution).
%============================================
% for standard deviation use stds
% for standard error
ste = stds./sqrt(n);
% for 95% confidence interval
ci95 = 1.96 * ste;
%============================================
%Last thing is to plot the error bars. Here I chose the ci95 as you asked
%in your question, if you want to change that, simply change the variable
%in the call to errorbar:
%============================================
for c = 1:3
size(means(:, c))
size(b(c).XData)
e = errorbar(b(c).XData + b(c).XOffset, means(:,c), ci95(:, c));
e.LineStyle = 'none';
end
Since I am not sure how your data looks like, since in your question you stated that the elements of the cell contain data with different dimension like
3x100, 3x40, 66x2
I assume that your data can be arranged in columns or rows and that not all data requires three bars.
Since you did not provide a short piece of your data for us to test, I generate some artificial data:
data = cell(1,10);
% Random length of the data
l = randi(500, 10, 1) + 50;
% Random "width" of the data, with 3 more likely
w = randi(4, 10, 1);
w(w==4) = 3;
% random "direction" of the data
d = randi(2, 10, 1);
% sigma of the data (in fraction of mean)
sigma = rand(10,1) / 3;
% means of the data
dmean = randi(150,10,1);
dsigma = dmean.*sigma;
for c = 1 : 10
if d(c) == 1
data{c} = randn(l(c), w(c)) .* dsigma(c) + dmean(c);
else
data{c} = randn(w(c), l(c)) .* dsigma(c) + dmean(c);
end
end
Next thing is
On the bar, I want it to be shown the median of the values, and I want to calculate the confidence interval and show it additionally.
Are you really sure you want to plot the median? The median of some data is not connected to the variance of the data, and hus no type of error bars are required. I guess you want to show the mean. If you really want to show the median, a box plot might be a better alternative.
The following code computes and plots the mean in a bar plot:
means = zeros(numel(data),3);
stds = zeros(numel(data),3);
n = zeros(numel(data),3);
for c = 1:numel(data)
d = data{c};
if size(d,1) < size(d,2)
d = d';
end
cols = size(d,2);
means(c, 1:cols) = nanmean(d);
stds(c, 1:cols) = nanstd(d);
n(c, 1:cols) = sum(~isnan((d)));
end
b = bar(means);
Now, we need to compute the length of the error bars. Typical choices are the standard deviation of the data (already computed by the code above, stored in stds), the standard error or the 95% confidence interval (which is the 1.96fold of the standard error, assuming the underlying data follows a normal distribution).
% for standard deviation use stds
% for standard error
ste = stds./sqrt(n);
% for 95% confidence interval
ci95 = 1.96 * ste;
Last thing is to plot the error bars. Here I chose the ci95 as you asked in your question, if you want to change that, simply change the variable in the call to errorbar:
for c = 1:3
size(means(:, c))
size(b(c).XData)
e = errorbar(b(c).XData + b(c).XOffset, means(:,c), ci95(:, c));
e.LineStyle = 'none';
end

How to generate the desired oscillation graph? [MATLAB]

I have a mathematical equation that describes a dynamical system as
The parameters are defined as follows
k1=1; S=1; Kd=1; p=2; tau=10; k2=1; ET=1; Km=1;
I coded the system as
y(1) = 1; % based on the y-axes starting point in the last figure
y(2) = y(1) + k1*S*Kd^p/(Kd^p + y(1)^p) - k2*ET*y(1)/(Km + y(1)); % to avoid errors
for t=1:100
y(t+1) = y(t+1) + (k1*S*Kd^p/(Kd^p + y(t)^p) - k2*ET*y(t+1)/(Km + y(t+1)));
end
plot(y);
Note that I did not use tau=10 for simplicity and instead used a delayed version by 1 instead of 10 (because I am not sure how to insert a delay of 10)
And obtained the following result
However, I need to obtain this
Can anyone help me rectify the mistake in my code?
Thanking you in advance.
If we assume that for Y(t) = 0 for t < 0 then you're code could be modified to produce a similar plot. However, it looks like the plot you are looking to generate uses different initial conditions. If you're just looking to measure Tc then it appears that the signal stabilizes with the period you're looking for.
k1=1; S=1; Kd=1; p=2; tau=10; k2=1; ET=1; Km=1;
% time step size (tau MUST be divisible by dt to ensure proper array indexing)
dt = 0.01;
% time series
t = -10:dt:100;
% initialize y to all zeros so that y(t)=0 for all t<0 (initial condition)
y = zeros(size(t));
% Find starting and ending indexes to iterate from t=0 to t=100-dt
idx0 = find(t == 0);
idx1 = numel(t)-1;
% initial condition y(0) = 1
y(idx0) = 1;
for n = idx0:idx1
% The indexing used here ensures the following equivalences.
% y(n+1) = y(t+dt)
% y(n) = y(t)
% y(n - round(tau/dt)) = y(t-tau)
%
% Note that (y(t+dt)-y(t))/dt is approximately y'(t)
% Solving for y(t+dt) we get the following formula
y(n+1) = y(n) + dt*((k1*S*Kd^p/(Kd^p + y(n - round(tau/dt))^p) - k2*ET*y(n)/(Km + y(n))));
end
% plot y(t) for t > 0
plot(t(t>0),y(t>0));
Result
Seeing as things stabilize we can take the values in one of the periods and use those for the initial conditions and we get.
Edit: To elaborate, the function contains a delay of 10 which means that instead of just a single initial condition at y(0), we also need to initialize all values from t=-10 to 0. In the code posted in this answer I arbitrarily assumed that y(t) = 0 for t < 0 and y(0) = 1 because I don't know otherwise. Once we run the code and see that the signal becomes periodic we can borrow the values from one of these periods to use those as the initial conditions.
From the diagram you posted we can use our intuition to guess that, before time 0, the signal probably looks something like the region highlighted in the figure below.
If, rather than using zero to initialize y at y < 0, we copy the values in the red highlighted region, then we get a plot that is more like what you desire.
To get the plot shown above I ran the script once, then found the indices in y for the part I wanted to use as initial conditions, then copied those into a new array.
init_cond = y(7004:8004);
Then I changed script to use this array as the initial condition and changed the initial y values to
y = zeros(size(t));
y(1:1001) = init_cond;
and ran the modified script again.
Edit 2: The built-in function dde23 appears to be applicable for your problem. To see an example run the command edit ddex1 in the command window.

Finding the difference between two signals

I have two signals, let's call them 'a' and 'b'. They are both nearly identical signals (recorded from the same input and contain the same information) however, because I recorded them at two different 'b' is time shifted by an unknown amount. Obviously, there is random noise in each.
Currently, I am using cross correlation to compute the time shift, however, I am still getting improper results.
Here is the code I am using to calculate the time shift:
function [ diff ] = FindDiff( signal1, signal2 )
%FINDDIFF Finds the difference between two signals of equal frequency
%after an appropritate time shift is applied
% Calculates the time shift between two signals of equal frequency
% using cross correlation, shifts the second signal and subtracts the
% shifted signal from the first signal. This difference is returned.
length = size(signal1);
if (length ~= size(signal2))
error('Vectors must be equal size');
end
t = 1:length;
tx = (-length+1):length;
x = xcorr(signal1,signal2);
[mx,ix] = max(x);
lag = abs(tx(ix));
shifted_signal2 = timeshift(signal2,lag);
diff = signal1 - shifted_signal2;
end
function [ shifted ] = timeshift( input_signal, shift_amount )
input_size = size(input_signal);
shifted = (1:input_size)';
for i = 1:input_size
if i <= shift_amount
shifted(i) = 0;
else
shifted(i) = input_signal(i-shift_amount);
end
end
end
plot(FindDiff(a,b));
However the result from the function is a period wave, rather than random noise, so the lag must still be off. I would post an image of the plot, but imgur is currently not cooperating.
Is there a more accurate way to calculate lag other than cross correlation, or is there a way to improve the results from cross correlation?
Cross-correlation is usually the simplest way to determine the time lag between two signals. The position of peak value indicates the time offset at which the two signals are the most similar.
%// Normalize signals to zero mean and unit variance
s1 = (signal1 - mean(signal1)) / std(signal1);
s2 = (signal2 - mean(signal2)) / std(signal2);
%// Compute time lag between signals
c = xcorr(s1, s2); %// Cross correlation
lag = mod(find(c == max(c)), length(s2)) %// Find the position of the peak
Note that the two signals have to be normalized first to the same energy level, so that the results are not biased.
By the way, don't use diff as a name for a variable. There's already a built-in function in MATLAB with the same name.
Now there are two functions in Matlab:
one called finddelay
and another called alignsignals that can do what you want, I believe.
corr finds a dot product between vectors (v1, v2). If it works bad with your signal, I'd try to minimize a sum of squares of differences (i.e. abs(v1 - v2)).
signal = sin(1:100);
signal1 = [zeros(1, 10) signal];
signal2 = [signal zeros(1, 10)];
for i = 1:length(signal1)
signal1shifted = [signal1 zeros(1, i)];
signal2shifted = [zeros(1, i) signal2];
d2(i) = sum((signal1shifted - signal2shifted).^2);
end
[fval lag2] = min(d2);
lag2
It is computationally worse than cross-calculation which can be speeded up by using FFT. As far as I know you can't do this with euclidean distance.
UPD. Deleted wrong idea about cross-correlation with periodic signals
You can try matched filtering in frequency domain
function [corr_output] = pc_corr_processor (target_signal, ref_signal)
L = length(ref_signal);
N = length(target_signal);
matched_filter = flipud(ref_signal')';
matched_filter_Res = fft(matched_filter,N);
corr_fft = matched_filter_Res.*fft(target_signal);
corr_out = abs(ifft(corr_fft));
The peak of the matched filter maximum-index of corr_out above should give you the lag amount.

How to calculate p-value for t-test in MATLAB?

Is there some simple way of calculating of p-value of t-Test in MATLAB.
I found something like it however I think that it does not return correct values:
Pval=2*(1-tcdf(abs(t),n-2))
I want to calculate the p-value for the test that the slope of regression is equal to 0. Therefore I calculate the Standard Error
$SE= \sqrt{\frac{\sum_{s = i-w }^{i+w}{(y_{s}-\widehat{y}s})^2}{(w-2)\sum{s=i-w}^{i+w}{(x_{s}-\bar{x}})^2}}$
where $y_s$ is the value of analyzed parameter in time period $s$,
$\widehat{y}_s$ is the estimated value of the analyzed parameter in time period $s$,
$x_i$ is the time point of the observed value of the analysed parameter,
$\bar{x}$ is the mean of time points from analysed period and then
$t_{score} = (a - a_{0})/SE$ where $a_{0}$ where $a_{0} = 0$.
I checked that p values from ttest function and the one calculated using this formula:
% Let n be your sample size
% Let v be your degrees of freedom
% Then:
pvalues = 2*(1-tcdf(abs(t),n-v))
and they are the same!
Example with Matlab demo dataset:
load accidents
x = hwydata(:,2:3);
y = hwydata(:,4);
stats = regstats(y,x,eye(size(x,2)));
fprintf('T stat using built-in function: \t %.4f\n', stats.tstat.t);
fprintf('P value using built-in function: \t %.4f\n', stats.tstat.pval);
fprintf('\n\n');
n = size(x,1);
v = size(x,2);
b = x\y;
se = diag(sqrt(sumsqr(y-x*b)/(n-v)*inv(x'*x)));
t = b./se;
p = 2*(1-tcdf(abs(t),n-v));
fprintf('T stat using own calculation: \t\t %.4f\n', t);
fprintf('P value using own calculation: \t\t %.4f\n', p);

creating a train perceptron in MATLAB for gender clasiffication

I am coding a perceptron to learn to categorize gender in pictures of faces. I am very very new to MATLAB, so I need a lot of help. I have a few questions:
I am trying to code for a function:
function [y] = testset(x,w)
%y = sign(sigma(x*w-threshold))
where y is the predicted results, x is the training/testing set put in as a very large matrix, and w is weight on the equation. The part after the % is what I am trying to write, but I do not know how to write this in MATLAB code. Any ideas out there?
I am trying to code a second function:
function [err] = testerror(x,w,y)
%err = sigma(max(0,-w*x*y))
w, x, and y have the same values as stated above, and err is my function of error, which I am trying to minimize through the steps of the perceptron.
I am trying to create a step in my perceptron to lower the percent of error by using gradient descent on my original equation. Does anyone know how I can increment w using gradient descent in order to minimize the error function using an if then statement?
I can put up the code I have up till now if that would help you answer these questions.
Thank you!
edit--------------------------
OK, so I am still working on the code for this, and would like to put it up when I have something more complete. My biggest question right now is:
I have the following function:
function [y] = testset(x,w)
y = sign(sum(x*w-threshold))
Now I know that I am supposed to put a threshold in, but cannot figure out what I am supposed to put in as the threshold! any ideas out there?
edit----------------------------
this is what I have so far. Changes still need to be made to it, but I would appreciate input, especially regarding structure, and advice for making the changes that need to be made!
function [y] = Perceptron_Aviva(X,w)
y = sign(sum(X*w-1));
end
function [err] = testerror(X,w,y)
err = sum(max(0,-w*X*y));
end
%function [w] = perceptron(X,Y,w_init)
%w = w_init;
%end
%------------------------------
% input samples
X = X_train;
% output class [-1,+1];
Y = y_train;
% init weigth vector
w_init = zeros(size(X,1));
w = w_init;
%---------------------------------------------
loopcounter = 0
while abs(err) > 0.1 && loopcounter < 100
for j=1:size(X,1)
approx_y(j) = Perceptron_Aviva(X(j),w(j))
err = testerror(X(j),w(j),approx_y(j))
if err > 0 %wrong (structure is correct, test is wrong)
w(j) = w(j) - 0.1 %wrong
elseif err < 0 %wrong
w(j) = w(j) + 0.1 %wrong
end
% -----------
% if sign(w'*X(:,j)) ~= Y(j) %wrong decision?
% w = w + X(:,j) * Y(j); %then add (or subtract) this point to w
end
you can read this question I did some time ago.
I uses a matlab code and a function perceptron
function [w] = perceptron(X,Y,w_init)
w = w_init;
for iteration = 1 : 100 %<- in practice, use some stopping criterion!
for ii = 1 : size(X,2) %cycle through training set
if sign(w'*X(:,ii)) ~= Y(ii) %wrong decision?
w = w + X(:,ii) * Y(ii); %then add (or subtract) this point to w
end
end
sum(sign(w'*X)~=Y)/size(X,2) %show misclassification rate
end
and it is called from code (#Itamar Katz) like (random data):
% input samples
X1=[rand(1,100);rand(1,100);ones(1,100)]; % class '+1'
X2=[rand(1,100);1+rand(1,100);ones(1,100)]; % class '-1'
X=[X1,X2];
% output class [-1,+1];
Y=[-ones(1,100),ones(1,100)];
% init weigth vector
w=[.5 .5 .5]';
% call perceptron
wtag=perceptron(X,Y,w);
% predict
ytag=wtag'*X;
% plot prediction over origianl data
figure;hold on
plot(X1(1,:),X1(2,:),'b.')
plot(X2(1,:),X2(2,:),'r.')
plot(X(1,ytag<0),X(2,ytag<0),'bo')
plot(X(1,ytag>0),X(2,ytag>0),'ro')
legend('class -1','class +1','pred -1','pred +1')
I guess this can give you an idea to make the functions you described.
To the error compare the expected result with the real result (class)
Assume your dataset is X, the datapoins, and Y, the labels of the classes.
f=newp(X,Y)
creates a perceptron.
If you want to create an MLP then:
f=newff(X,Y,NN)
where NN is the network architecture, i.e. an array that designates the number of neurons at each hidden layer. For example
NN=[5 3 2]
will correspond to an network with 5 neurons at the first layers, 3 at the second and 2 a the third hidden layer.
Well what you call threshold is the Bias in machine learning nomenclature. This should be left as an input for the user because it is used during training.
Also, I wonder why you are not using the builtin matlab functions. i.e newp or newff. e.g.
ff=newp(X,Y)
Then you can set the properties of the object ff to do your job for selecting gradient descent and so on.