I am currently working with my undergraduate thesis on complex valued neural network(CVNN).My topic is based on Single-layered complex-valued neural network for real-valued classification problems.I am using gradient-descent learning rule to classify a dataset given below:
Data Set
The alogorithm i used here can be found on page 946 of the following PDF labeled as Complex valued neuron (CVN) Model.The main algorithm can be on section 3 of that topic
Algorithm of CVN Model
But instead of getting convergence,my error curve has shown divergent characteristics.Here is my output of error curve.
error curve at CVNN implementation
I am simulating the code behind this on MATLAB.My implementation is also given below:
clc
clear all
epoch=1000;
n=8;
%x=real input value
in=dlmread('Diabetes1.txt');
x=in(1:384,1:8);
%d=desired output value
out=dlmread('Diabetes1.txt');
data_1=out(1:384,9);
data_2=out(1:384,10);
%m=complex representation of input
m=(cos((pi).*(x(:,:)-0))+1i*sin((pi).*(x(:,:)-0)));
%
%research
%m=i.*x(:,:)
%m=x(:,:)+i.*x(:,:)
%Wih=weight
%
%m=x(:,:).*(cos(pi./4)+i.*sin(pi./4));
Wih1 =0.5* exp(1i * 2*pi*rand(8,1));
Wih2 =0.5* exp(1i * 2*pi*rand(8,1));
%Pih=bias
Pih1 =0.5*exp(1i * 2*pi*rand(1,1));
Pih2 =0.5*exp(1i * 2*pi*rand(1,1));
for ite=1:epoch
% www(ite)=ite;
E_Total=0;
E1t=0;
E2t=0;
for j=1:384
%blr=learning rate
blr=0.1;
%cpat=current pattern
cpat = m(j,:);
z1=cpat*Wih1+Pih1;
u1=real(z1);
v1=imag(z1);
fu1=1/(1+exp(-u1));
fv1=1/(1+exp(-v1));
%y=actual output
%for activation function 1
y1=sqrt((fu1).^2+(fv1).^2);
%for activation function 2
% y1=(fu1-fv1).^2;
error1=(data_1(j,1)-y1);
E1=((data_1(j,1)-y1).^2);
t11=1./(1+exp(-u1));
f11=t11.*(1-t11);
t21=1./(1+exp(-v1));
f21=t21.*(1-t21);
%for activation function 1
r1= blr.*(data_1(j,1)-y1).*((t11.*f11)./y1)+i.*blr.*(data_1(j,1)-y1).*((t21.*f21)./y1);
%for activation function 2
%r1=2.*blr.*(data_1(j,1)-y1).*(t11-t21).*f11+1i.*2.*blr.*(data_1(j,1)-y1).*(t21-t11).*f21;
%
Pih1=Pih1+r1;
Wih1= Wih1+(conj(m(j,:)))'.*r1;
%////////////////////////////////////////////////
%cpat=current pattern
z2=cpat*Wih2+Pih2;
u2=real(z2);
v2=imag(z2);
fu2=1./(1+exp(-u2));
fv2=1./(1+exp(-v2));
% fu2=tanh(u2);
% fv2=tanh(v2);
%y=actual output
%for activation function 1
y2=sqrt((fu2).^2+(fv2).^2);
%for activation function 2
% y2=(fu2-fv2).^2;
error2=(data_2(j,1)-y2);
E2=((data_2(j,1)-y2).^2);
t12=1./(1+exp(-u2));
f12=t12.*(1-t12);
t22=1./(1+exp(-v2));
f22=t22.*(1-t22);
%for activation function1
r2= blr.*(data_2(j,1)-y2).*((t12.*f12)./y2)+i.*blr.*(data_2(j,1)-y2).*((t22.*f22)./y2);
%for activation function 2
%r2=2*blr*(data_2(j,1)-y2)*(t12-t22)*f12+1i*2*blr*(data_2(j,1)-y2)*(t22-t12)*f22;
Pih2=Pih2+r2;
Wih2= Wih2+(conj(m(j,:)))'.*r2;
%///////////////////////////////////////////////
E1t=E1+E1t;
E2t=E2+E2t;
E_Total=(E1+E2+E_Total);
E1;
E2;
end
Err=E_Total/(2.*384);
figure(1)
plot(ite,Err,'b--')
hold on;
%figure(1)
end
dlmwrite('weight.txt',Wih1)
dlmwrite('weight.txt', Wih2, '-append', ...
'roffset', 1, 'delimiter', ' ')
dlmwrite('weight.txt', Pih1, '-append', ...
'roffset', 1, 'delimiter', ' ')
dlmwrite('weight.txt', Pih2, '-append', ...
'roffset', 1, 'delimiter', ' ')
I still could not figure out reason behind this opposite characteristics on the dataset.So any kind of help regarding this is appreciated.
If you are doing gradient descent, a very common debugging technique is to check whether the gradient you calculated actually matches the numerical gradient of your loss function.
That is, check
(f(x+dx)-f(x))/dx==f'(x)*dx
for a variety of small dx. Usually try along each dimension, as well as in a variety of random directions. You also want to do this check for a variety of value of x.
You should take a glance at this blog for complex back-propagation.
For holomorphic functions, complex BP are fairly straight forward.
For non-holomorphic functions (every CVNN must have at least one non-holomorphic function), they need careful treat.
Related
This is my first go with ML (and Matlab) and I'm following "Learning From Data" by Yaser S. Abu-Mostafa.
I'm trying to implement the Perceptron algorithm, after trying to go through the pseudocode, using other people's solutions I can't seem to fix my problem (I went through other threads too).
The algorithm separates the data fine, it works. However, I want to plot a single line, but it seems as it separates them in a way so the '-1' cluster is divided to a second cluster or more.
This is the code:
iterations = 100;
dim = 3;
X1=[rand(1,dim);rand(1,dim);ones(1,dim)]; % class '+1'
X2=[rand(1,dim);1+rand(1,dim);ones(1,dim)]; % class '-1'
X=[X1,X2];
Y=[-ones(1,dim),ones(1,dim)];
w=[0,0,0]';
% call perceptron
wtag=weight(X,Y,w,iterations);
% predict
ytag=wtag'*X;
% plot prediction over origianl data
figure;hold on
plot(X1(1,:),X1(2,:),'b.')
plot(X2(1,:),X2(2,:),'r.')
plot(X(1,ytag<0),X(2,ytag<0),'bo')
plot(X(1,ytag>0),X(2,ytag>0),'ro')
legend('class -1','class +1','pred -1','pred +1')
%Why don't I get just one line?
plot(X,Y);
The weight function (Perceptron):
function [w] = weight(X,Y,w_init,iterations)
%WEIGHT Summary of this function goes here
% Detailed explanation goes here
w = w_init;
for iteration = 1 : iterations %<- was 100!
for ii = 1 : size(X,2) %cycle through training set
if sign(w'*X(:,ii)) ~= Y(ii) %wrong decision?
w = w + X(:,ii) * Y(ii); %then add (or subtract) this point to w
end
end
sum(sign(w'*X)~=Y)/size(X,2); %show misclassification rate
end
I don't think the problem is in the second function but I added it regardless
I'm pretty sure the algorithm separates it to more than one cluster but I can't tell why most of the learning I've done so far was math and theory and not actual coding so I'm probably missing something obvious..
I am new to neural network and trying to write a wavelet neural network without matlab toolbox.
For function approximation the sigmoid active function output is between 0 and 1 and for calculate the error we use sum square, this is what i know.
But after some iteration the result shows NaN(not a number) in Matlab. Perhaps the change in parameter of weights in the NN(neural network) and doesn't show the NaN , at the end for testing the NN for any input it shows specific constant value.
For more information for the NN:
3 layers==> 1 input in inputLayer,20 units in hidden layer, 1 output in outputLayer
This is a part of the code:
--------------------------------------------------------
x=0:0.1*pi:pi;
y=sin(x);
HiddenLayer=20;%Number of Units in HiddenLayer
M=size(x,1);%Number of row in InputData
N=size(x,2);%Number of col in InputData
% % -------InitialValues------
w=zeros(1,HiddenLayer);
w(1,:)=1;%Initial
a=zeros(M,HiddenLayer);
a(:,:)=0.4;%Initial
b=zeros(M,HiddenLayer);
b(:,:)=0.5;
miu=0.2;%Number
beta=0.05;%Number
ErrorGoal=0.0001;%Number
%---------------Main--------------------
Iteration=1;
while Iteration<1000
ErrorSave=zeros(1,N);
z=zeros(1,HiddenLayer);
p=zeros(1,HiddenLayer);
multiwp=zeros(1,HiddenLayer);
for u=1:N % for more than one input if needed
% keyboard;
for i=1:HiddenLayer
z(i)=((x(u)-b(i))/a(i));
end
for j=1:HiddenLayer
p(j)=cos(1.75*z(j))*exp((-z(j)^2)/2);
end
for k=1:HiddenLayer
multiwp(k)=w(k)*p(k);
end
sumwp=sum(multiwp);
gTotal=sigmf(sumwp,[1 0]);
Error=0.5*(y(u)-gTotal).^2;
% keyboard;
ErrorSave(u)=Error;
% keyboard;
gradEW=-(y(u)-gTotal).*gTotal.*(1-gTotal)*cos(1.75*z);
gradEB=zeros(M,HiddenLayer);
for m=1:HiddenLayer
gradEB(m)=-(y(u)-gTotal).*gTotal.*(1-gTotal).*(w(m)./a(m)).*exp(-(z(m).^2)/2).*...
((1.75*sin(1.75*z(m)))+(z(m)*cos(1.75*z(m))));
end
gradEA=zeros(M,HiddenLayer);
for o=1:HiddenLayer
gradEA(o)=-(y(u)-gTotal).*gTotal.*(1-gTotal)*z(o).*(w(o)./a(o))*exp(-(z(o).^2)/2).*...
((1.75*sin(1.75*z(o)))+(z(o)*cos(1.75*z(o))));
end
% keyboard;
%------------Updating Parameter ---------------
w=w-(miu*gradEW)+beta*w;
b=b+(miu*gradEB)+beta*b;
a=a+(miu*gradEA)+beta*a;
end
----------------------------------------------------------------------
If anyone can help and point out my mistake in concept or coding.
Best wishes :)
I implemented a multilayer perceptron with 1 hidden layer on MNIST dataset. The activation function in hidden layer is leaky(0.01) ReLu and output layer has a softmax activation function. The learning method is mini-batch SGD. The network structure is 784*30*10. The problem is I found the predictions the network made, for each input sample, are quite similar. That means the model would always like to think the image is some certain number. Thanks #Lemm Ras for pointing out the label-data mismatching problem in previous data_shuffle function and now fixed. But after some batch training, I found the predictions are still some kind of similar: That's confusing.
Another issue is the update value is too small comparing with original weight, in the MLP code, I add variable 'cc' and 'dd' to record the ratio between their weight_update and weight,
cc=W_OUTPUT_Update./W_OUTPUT;
dd=W_MLP_Update./W_MLP;
During debugging, the magnitude for cc is 10^-4(0.0001) and dd is also 10^-4. This might be the reason that the accuracy doesn't seems improved a lot.
After several days debugging. I have no idea why that happens and how to solve it, it made me stuck for one week. Can someone help me please?
The screenshot is the value of A2 after softmax function.
[dimension, images, labels, labels_matrix, train_amount, test_labels_matrix, test_images, test_labels, test_amount] = load_mnist_data(); %initialize str
images=images(:,1:10000); % for debugging, get part of whole data set
labels=labels(1:10000,1);
labels_matrix=labels_matrix(:,1:10000);
test_images=test_images(:,1:500);
test_labels=test_labels(1:500,1);
train_amount=10000;
test_amount=500;
% initialize the structure
[ W_MAD, W_MLP, W_OUTPUT] = initialize_structure(dimension, train_amount, test_amount);
epoch=100;
correct_rate=zeros(1,epoch); %record testing accuracy
corr=zeros(1,epoch); %record training accuracy
lr=0.2;
lamda=0;
batch_size=50;
for i=1:epoch
sprintf('MLP in iteration %d over %d', i, epoch)
%shuffle data
[labels_shuffled labels_matrix_shuffled images_shuffled]=shuffle_data(labels, labels_matrix,images);
[ cor, W_MLP, W_OUTPUT ] = train_mlp_relu(lr, leaky, lamda, momentum_gamma, batch_size,W_MLP, W_OUTPUT, W_MAD, power, images_shuffled, train_amount, labels_shuffled, labels_matrix_shuffled);
corr(i)=cor/train_amount;
% test
correct_rate(i) = structure_test( W_MAD, W_MLP, W_OUTPUT, test_images, test_labels, test_amount );
end
% plot results
plot(1:epoch,correct_rate);
Here's the training MLP function, please ignore L2 regularization parameter lamda which is currently set as 0.
%MLP with batch size batch_size
cor=0;
%leaky=(1/batch_size);
leaky=0.001;
for i=1:train_amount/batch_size
batch_images=images(:,batch_size*(i-1)+1:batch_size*i);
batch_labels=labels_matrix(:,batch_size*(i-1)+1:batch_size*i);
%from MAD to MLP
V1=W_MLP'*batch_images;
V1(1,:)=1; %set bias unit as 1
V1_dirivative=ones(size(V1));
V1_dirivative(find(V1<0))=leaky;
A1=relu(V1,leaky); % A stands for activation
V2=W_OUTPUT'* A1;
A2=softmax(V2);
%write these scope control codes into functions.
%train error
[val idx]=max(A2);
idx=idx-1; %because index(idx) for matrix vaires from 1 to 10 while label varies from 0 to 9.
res=labels(batch_size*(i-1)+1:batch_size*i)-idx';
cor=cor+sum(res(:)==0);
%softmax loss, due to relu applied nodes that has
%contribution to activate neurons has gradient 1; while <0 nodes
%has no contribution
delta_softmax=-(1/batch_size)*(batch_labels-A2);
delta_output=W_OUTPUT*delta_softmax.*V1_dirivative;
%update
W_OUTPUT_Update=lr*(1/batch_size)*A1*delta_softmax'+lamda*W_OUTPUT;
cc=W_OUTPUT_Update./W_OUTPUT;
W_MLP_Update=lr*(1/batch_size)*batch_images*delta_output'+lamda*W_MLP;
dd=W_MLP_Update./W_MLP;
k=mean(A2,2);
W_OUTPUT=W_OUTPUT-W_OUTPUT_Update;
W_MLP=W_MLP-W_MLP_Update;
end
end
Here is the softmax function:
function [ val ] = softmax( val )
val=exp(val);
val=val./repmat(sum(val),10,1);
end
The labels_matrix is the aimed output matrix for A2 and created as:
labels_matrix=full(sparse(labels+1,1:train_amount,1));
test_labels_matrix=full(sparse(test_labels+1,1:test_amount,1));
And Relu:
function [ val ] = relu( val,leaky )
val(find(val<0))=leaky*val(find(val<0));
end
Data shuffle
%this version is wrong, due to it only shuffles label and data without doing the same shuffling on the 'labels_matrix' which is used to calculate MLP's delta in output layer. It destroyed the link between data and label.
% function [ label, data ] = shuffle_data( label, data )
% [row column]=size(data);
% array=randperm(column);
% data=data(:,array);
% label=label(array);
% %if shuffle respect to row then use the code below
% %data=data(randperm(row),:);
% end
function [ label, label_matrix, data ] = shuffle_data( label, label_matrix, data )
[row column]=size(data);
array=randperm(column);
data=data(:,array);
label=label(array);
label_matrix=label_matrix(:, array);
%if shuffle respect to row then use the code below
%data=data(randperm(row),:);
end
Data loading:
function [ dimension, images, labels, labels_matrix, train_amount, test_labels_matrix, test_images, test_labels, test_amount] = load_mnist_data()
%%load training and testing data, labels
data_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/train-images.idx3-ubyte';
label_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/train-labels.idx1-ubyte';
test_data_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/t10k-images.idx3-ubyte';
test_label_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/t10k-labels.idx1-ubyte';
images = loadMNISTImages(data_location);
labels = loadMNISTLabels(label_location);
test_images=loadMNISTImages(test_data_location);
test_labels=loadMNISTLabels(test_label_location);
%%data centralization
[dimension train_amount]=size(images);
[dimension test_amount]=size(test_images);
%%complete normalization
%%transform labels from index to matrix in order to apply square loss function in output layer
labels_matrix=full(sparse(labels+1,1:train_amount,1));
test_labels_matrix=full(sparse(test_labels+1,1:test_amount,1));
end
When you are shuffling the images, the association data-label is lost. Since this association must survive, what you need is to enforce the same shuffling for both data and labels.
In order to do so you could, for instance, Create an external shuffled index list: shuffled=randperm(N), with N the number of images and then pass to the train method either the list created or the elements of images and label addressed by the shuffled list.
I'm trying to implement a simple SVM linear binary classification in Matlab but I got strange results.
I have two classes g={-1;1} defined by two predictors varX and varY. In fact, varY is enough to classify the dataset in two distinct classes (about varY=0.38) but I will keep varX as random variable since I will need it to other works.
Using the code bellow (adapted from MAtlab examples) I got a wrong classifier. Linear classifier should be closer to an horizontal line about varY=0.38, as we can perceive by ploting 2D points.
It is not displayed the line that should separate two classes
What am I doing wrong?
g(1:14,1)=1;
g(15:26,1)=-1;
m3(:,1)=rand(26,1); %varX
m3(:,2)=[0.4008; 0.3984; 0.4054; 0.4048; 0.4052; 0.4071; 0.4088; 0.4113; 0.4189;
0.4220; 0.4265; 0.4353; 0.4361; 0.4288; 0.3458; 0.3415; 0.3528;
0.3481; 0.3564; 0.3374; 0.3610; 0.3241; 0.3593; 0.3434; 0.3361; 0.3201]; %varY
SVMmodel_testm = fitcsvm(m3,g,'KernelFunction','Linear');
d = 0.005; % Step size of the grid
[x1Grid,x2Grid] = meshgrid(min(m3(:,1)):d:max(m3(:,1)),...
min(m3(:,2)):d:max(m3(:,2)));
xGrid = [x1Grid(:),x2Grid(:)]; % The grid
[~,scores2] = predict(SVMmodel_testm,xGrid); % The scores
figure();
h(1:2)=gscatter(m3(:,1), m3(:,2), g,'br','ox');
hold on
% Support vectors
h(3) = plot(m3(SVMmodel_testm.IsSupportVector,1),m3(SVMmodel_testm.IsSupportVector,2),'ko','MarkerSize',10);
% Decision boundary
contour(x1Grid,x2Grid,reshape(scores2(:,1),size(x1Grid)),[0 0],'k');
xlabel('varX'); ylabel('varY');
set(gca,'Color',[0.5 0.5 0.5]);
hold off
A common problem with SVM or any classification method for that matter is unnormalized data. You have one dimension that spans for 0 to 1 and the other from about 0.3 to 0.4. This causes inbalance between the features. Common practice is to somehow normalize the features, for examply by std. try this code:
g(1:14,1)=1;
g(15:26,1)=-1;
m3(:,1)=rand(26,1); %varX
m3(:,2)=[0.4008; 0.3984; 0.4054; 0.4048; 0.4052; 0.4071; 0.4088; 0.4113; 0.4189;
0.4220; 0.4265; 0.4353; 0.4361; 0.4288; 0.3458; 0.3415; 0.3528;
0.3481; 0.3564; 0.3374; 0.3610; 0.3241; 0.3593; 0.3434; 0.3361; 0.3201]; %varY
m3(:,2) = m3(:,2)./std(m3(:,2));
SVMmodel_testm = fitcsvm(m3,g,'KernelFunction','Linear');
Notice the line before the last.
Stuck on numerically evaluating this integral and only have experience in doing simple quadrature schemes so bear with me if this looks amateurish. The nested integral is below using Gauss-Hermite(-inf,+inf) and a variant of Gauss Laguerre scheme (0,+inf) - over a gaussian and gamma(v/2,2) densities. I get to the point where the I m nearly finished the integration, intermediate steps look good, but I m stuck on how to combine the weights to evaluate the overall integral. I'd be really grateful for suggestions in modifing the code/other idea to code a better quadrature scheme that solves the problem.
\begin{equation}
\int^{\infty}{-\infty}\int^{\infty}{0} \prod_{i=1}^n\Phi \left(\frac{\sqrt{w/v}\,C{i}-a{i}Z}{\sqrt{1a {i^2}}\right)f{z}(Z)f{w}(W)dwdz
\end{equation}
% script defines nodes, weights, parameters then calls one main and one subfunction
rho=0.3; nfirms=10; h=repmat(0.1,[1,nfirms]); T=1; R=0.4; v=8; alpha=v/2;GaussPts=15;
% Quadrature nodes - gaussian and gamma(v/2) from Miranda and Fackler CompEcon
% toolbox
[x_norm,w_norm] = qnwnorm(GaussPts,0,1);
[x_gamma,w_gamma] = qnwgamma(GaussPts,alpha);
L_mat=zeros(nfirms+1,GaussPts);
for i=1:1:GaussPts;
L_mat(:,i) = TC_gamma(x_norm(i,:),x_gamma(i,:),h,rho,T,v,nfirms);
end;
w_norm_mat= repmat(w_norm',nfirms+1,1);
w_gamma_mat = repmat(w_gamma',nfirms+1,1);
% need to weight L_mat by the gaussian and chi-sq i.e, (gamma v/2,2)?
ucl = L_mat.*w_norm;%?? HERE
ucl2 = sum(ucl.*w_gamma2,2);% ?? HERE
function [out] = TC_gamma(x_norm,x_gamma,h,rho,T,v,nfirms)
% calls subfunction feeds into recursion
qki= Vec_CondPTC_gamma(x_norm,x_gamma,h,rho,T,v)' ;
fpdf=zeros(nfirms+1,nfirms+1);
% start at the first point on the tree
fpdf(1,1)=1;
for i=2:nfirms+1 ;
fpdf(1,i)=fpdf(1,i-1)*(1-qki(:,i-1));
for j=2:nfirms+1;
fpdf(j,i)=fpdf(j,i-1)*(1-qki(:,i-1))+fpdf(j-1,i-1)*qki(:,i-1);
end
fpdf(i,i)=fpdf(i-1,i-1)*qki(:,i-1);
end
out=fpdf(:,end);
end% of function TC_gamma
function qki= Vec_CondPTC_gamma(x_norm,x_gamma,h,rho,T,v)
PD = (1-exp(-kron(h,T)));DB = tinv(PD,v);
a=rho.^0.5; sqrt1_a2 = sqrt(1-sum(a.*a,2));
aM = gtimes(a, x_norm'); Sqrt_W=gamcdf(x_gamma,v/2,2).^0.5;
DB_times_W= gtimes(DB,Sqrt_W); DB_minus_aM = gminus(DB_times_W',aM);
qki=normcdf(grdivide(DB_minus_aM,sqrt1_a2));
end% of function Vec_CondPTC