Using Julia Flux to build a simple neural network - neural-network

I have a dataset of images (https://www.kaggle.com/iarunava/cell-images-for-detecting-malaria), and I want to use a neural network to know if one picture is a uninfected cell or not.
So I arranged my data to get 4 variables :
X_tests, Y_tests, X_training, Y_training
Each of these variable is of type Array{Array{Float64,1},1}
And I have a function to build a simple neural network (that comes from an example https://smist08.wordpress.com/2018/09/24/julia-flux-for-machine-learning/):
function simple_nn(X_tests, Y_tests, X_training, Y_training)
input = 100*100*3
hl1 = 32
m = Chain(
Dense(input, 32, relu),
Dense(32, 2),
softmax) |> gpu
loss(x, y) = crossentropy(m(x), y)
accuracy(x, y) = mean(onecold(m(x)) .== onecold(y))
dataset = [(X_training,Y_training)]
evalcb = () -> #show(loss(X_training, Y_training))
opt = ADAM(params(m))
Flux.train!(loss, dataset, opt, cb = throttle(evalcb, 10))
println("acc X,Y ", accuracy(X_training, Y_training))
println("acc tX, tY ", accuracy(X_tests, Y_tests))
end
And I get this error after executing simple_nn(X_tests, Y_tests, X_training, Y_training) :
ERROR: DimensionMismatch("matrix A has dimensions (32,30000), vector B has length 2668")
...
The error is on this line : Flux.train!(loss, dataset, opt, cb = throttle(evalcb, 10))
I don't know what the functions are doing, what argument they take, what they are returning and I can't find any documentation on the internet. I can only find examples.
So I have two questions : How can I make this work for my dataset? And Is there a documentation for Flux functions, like for sklearn? (like this for example : https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)

Can you provide a self-contained MWE? I think your X_training is not of dimension 3*100*100x something, and it is in fact 2688 by something.
Your first layer is Dense(input, 32, relu) and input is 3*100*100, so it's expect an input where one of dimension is 3*100*100 which you don't satisfy.

Maybe try to replace
dataset = [(X_training,Y_training)]
with
dataset = zip(X_training,Y_training)
zip actually pairs the training data 1 of X with 1 of Y and thus turns a tuple of vectors into a vector of tuples. I would guess that your training data has 2688 samples?

Related

Fitting a neural network with ReLUs to polynomial functions

Out of curiosity I am trying to fit neural network with rectified linear units to polynomial functions.
For example, I would like to see how easy (or difficult) it is for a neural network to come up with an approximation for the function f(x) = x^2 + x. The following code should be able to do it, but seems to not learn anything. When I run
using Base.Iterators: repeated
ENV["JULIA_CUDA_SILENT"] = true
using Flux
using Flux: throttle
using Random
f(x) = x^2 + x
x_train = shuffle(1:1000)
y_train = f.(x_train)
x_train = hcat(x_train...)
m = Chain(
Dense(1, 45, relu),
Dense(45, 45, relu),
Dense(45, 1),
softmax
)
function loss(x, y)
Flux.mse(m(x), y)
end
evalcb = () -> #show(loss(x_train, y_train))
opt = ADAM()
#show loss(x_train, y_train)
dataset = repeated((x_train, y_train), 50)
Flux.train!(loss, params(m), dataset, opt, cb = throttle(evalcb, 10))
println("Training finished")
#show m([20])
it returns
loss(x_train, y_train) = 2.0100101f14
loss(x_train, y_train) = 2.0100101f14
loss(x_train, y_train) = 2.0100101f14
Training finished
m([20]) = Float32[1.0]
Anyone here sees how I could make the network fit f(x) = x^2 + x?
There seem to be couple of things wrong with your trial that have mostly to do with how you use your optimizer and treat your input -- nothing wrong with Julia or Flux. Provided solution does learn, but is by no means optimal.
It makes no sense to have softmax output activation on a regression problem. Softmax is used in classification problems where the output(s) of your model represent probabilities and therefore should be on the interval (0,1). It is clear your polynomial has values outside this interval. It is usual to have linear output activation in regression problems like these. This means in Flux no output activation should be defined on the output layer.
The shape of your data matters. train! computes gradients for loss(d...) where d is a batch in your data. In your case a minibatch consists of 1000 samples, and this same batch is repeated 50 times. Neural nets are often trained with smaller batches sizes, but a larger sample set. In the code I provided all batches consist of different data.
For training neural nets, in general, it is advised to normalize your input. Your input takes values from 1 to 1000. My example applies a simple linear transformation to get the input data in the right range.
Normalization can also apply to the output. If the outputs are large, this can result in (too) large gradients and weight updates. Another approach is to lower the learning rate a lot.
using Flux
using Flux: #epochs
using Random
normalize(x) = x/1000
function generate_data(n)
f(x) = x^2 + x
xs = reduce(hcat, rand(n)*1000)
ys = f.(xs)
(normalize(xs), normalize(ys))
end
batch_size = 32
num_batches = 10000
data_train = Iterators.repeated(generate_data(batch_size), num_batches)
data_test = generate_data(100)
model = Chain(Dense(1,40, relu), Dense(40,40, relu), Dense(40, 1))
loss(x,y) = Flux.mse(model(x), y)
opt = ADAM()
ps = Flux.params(model)
Flux.train!(loss, ps, data_train, opt , cb = () -> #show loss(data_test...))

N-dimensional GP Regression

I'm trying to use GPflow for a multidimensional regression. But I'm confused by the shapes of the mean and variance.
For example: A 2-dimensional input space X of shape (20,20) is supposed to be predicted. My training samples are of shape (8,2) which means 8 training samples overall for the two dimensions. The y-values are of shape (8,1) which of course means one value of the ground truth per combination of the 2 input dimensions.
If I now use model.predict_y(X) I would expect to receive a mean of shape (20,20) but obtain a shape of (20,1). Same goes for the variance. I think that this problem comes from the shape of the y-values but I have have no idea how to fix it.
bound = 3
num = 20
X = np.random.uniform(-bound, bound, (num,num))
print(X_sample.shape) # (8,2)
print(Y_sample.shape) # (8,1)
k = gpflow.kernels.RBF(input_dim=2)
m = gpflow.models.GPR(X_sample, Y_sample, kern=k)
m.likelihood.variance = sigma_n
m.compile()
gpflow.train.ScipyOptimizer().minimize(m)
mean, var = m.predict_y(X)
print(mean.shape) # (20, 1)
print(var.shape) # (20, 1)
It sounds like you may be confused between the shape of a grid of input positions and the shape of the numpy arrays: if you want to predict on a 20 x 20 grid in two dimensions, you have 400 points in total, each with 2 values. So X (the one that you pass to m.predict_y()) should have shape (400, 2). (Note that the second dimension needs to have the same shape as X_sample!)
To construct this array of shape (400,2) you can use np.meshgrid (e.g., see What is the purpose of meshgrid in Python / NumPy?).
m.predict_y(X) only predicts the marginal variance at each test point, so the returned mean and var both have shape (400,1) (same length as X). You can of course reshape them to the 20 x 20 values on your grid.
(It is also possible to compute the full covariance, for the latent f this is implemented as m.predict_f_full_cov, which for X of shape (400,2) would return a 400x400 matrix. This is relevant if you want consistent samples from the GP, but I suspect that goes well beyond this question.)
I was indeed making the mistake to not flatten the arrays which in return produced the mistake. Thank you for the fast response STJ!
Here is an example of the working code:
# Generate data
bound = 3.
x1 = np.linspace(-bound, bound, num)
x2 = np.linspace(-bound, bound, num)
x1_mesh,x2_mesh = np.meshgrid(x1, x2)
X = np.dstack([x1_mesh, x2_mesh]).reshape(-1, 2)
z = f(x1_mesh, x2_mesh) # evaluation of the function on the grid
# Draw samples from feature vectors and function by a given index
size = 2
np.random.seed(1991)
index = np.random.choice(range(len(x1)), size=(size,X.ndim), replace=False)
samples = utils.sampleFeature([x1,x2], index)
X1_sample = samples[0]
X2_sample = samples[1]
X_sample = np.column_stack((X1_sample, X2_sample))
Y_sample = utils.samplefromFunc(f=z, ind=index)
# Change noise parameter
sigma_n = 0.0
# Construct models with initial guess
k = gpflow.kernels.RBF(2,active_dims=[0,1], lengthscales=1.0,ARD=True)
m = gpflow.models.GPR(X_sample, Y_sample, kern=k)
m.likelihood.variance = sigma_n
m.compile()
#print(X.shape)
mean, var = m.predict_y(X)
mean_square = mean.reshape(x1_mesh.shape) # Shape: (num,num)
var_square = var.reshape(x1_mesh.shape) # Shape: (num,num)
# Plot mean
fig = plt.figure(figsize=(16, 12))
ax = plt.axes(projection='3d')
ax.plot_surface(x1_mesh, x2_mesh, mean_square, cmap=cm.viridis, linewidth=0.5, antialiased=True, alpha=0.8)
cbar = ax.contourf(x1_mesh, x2_mesh, mean_square, zdir='z', offset=offset, cmap=cm.viridis, antialiased=True)
ax.scatter3D(X1_sample, X2_sample, offset, marker='o',edgecolors='k', color='r', s=150)
fig.colorbar(cbar)
for t in ax.zaxis.get_major_ticks(): t.label.set_fontsize(fontsize_ticks)
ax.set_title("$\mu(x_1,x_2)$", fontsize=fontsize_title)
ax.set_xlabel("\n$x_1$", fontsize=fontsize_label)
ax.set_ylabel("\n$x_2$", fontsize=fontsize_label)
ax.set_zlabel('\n\n$\mu(x_1,x_2)$', fontsize=fontsize_label)
plt.xticks(fontsize=fontsize_ticks)
plt.yticks(fontsize=fontsize_ticks)
plt.xlim(left=-bound, right=bound)
plt.ylim(bottom=-bound, top=bound)
ax.set_zlim3d(offset,np.max(z))
which leads to (red dots are the sample points drawn from the function). Note: Code not refactored what so ever :)

Merging two tensors by convolution in Keras

I'm trying to convolve two 1D tensors in Keras.
I get two inputs from other models:
x - of length 100
ker - of length 5
I would like to get the 1D convolution of x using the kernel ker.
I wrote a Lambda layer to do it:
import tensorflow as tf
def convolve1d(x):
y = tf.nn.conv1d(value=x[0], filters=x[1], padding='VALID', stride=1)
return y
x = Input(shape=(100,))
ker = Input(shape=(5,))
y = Lambda(convolve1d)([x,ker])
model = Model([x,ker], [y])
I get the following error:
ValueError: Shape must be rank 4 but is rank 3 for 'lambda_67/conv1d/Conv2D' (op: 'Conv2D') with input shapes: [?,1,100], [1,?,5].
Can anyone help me understand how to fix it?
It was much harder than I expected because Keras and Tensorflow don't expect any batch dimension in the convolution kernel so I had to write the loop over the batch dimension myself, which requires to specify batch_shape instead of just shape in the Input layer. Here it is :
import numpy as np
import tensorflow as tf
import keras
from keras import backend as K
from keras import Input, Model
from keras.layers import Lambda
def convolve1d(x):
input, kernel = x
output_list = []
if K.image_data_format() == 'channels_last':
kernel = K.expand_dims(kernel, axis=-2)
else:
kernel = K.expand_dims(kernel, axis=0)
for i in range(batch_size): # Loop over batch dimension
output_temp = tf.nn.conv1d(value=input[i:i+1, :, :],
filters=kernel[i, :, :],
padding='VALID',
stride=1)
output_list.append(output_temp)
print(K.int_shape(output_temp))
return K.concatenate(output_list, axis=0)
batch_input_shape = (1, 100, 1)
batch_kernel_shape = (1, 5, 1)
x = Input(batch_shape=batch_input_shape)
ker = Input(batch_shape=batch_kernel_shape)
y = Lambda(convolve1d)([x,ker])
model = Model([x, ker], [y])
a = np.ones(batch_input_shape)
b = np.ones(batch_kernel_shape)
c = model.predict([a, b])
In the current state :
It doesn't work for inputs (x) with multiple channels.
If you provide several filters, you get as many outputs, each being the convolution of the input with the corresponding kernel.
From given code it is difficult to point out what you mean when you say
is it possible
But if what you mean is to merge two layers and feed merged layer to convulation, yes it is possible.
x = Input(shape=(100,))
ker = Input(shape=(5,))
merged = keras.layers.concatenate([x,ker], axis=-1)
y = K.conv1d(merged, 'same')
model = Model([x,ker], y)
EDIT:
#user2179331 thanks for clarifying your intention. Now you are using Lambda Class incorrectly, that is why the error message is showing.
But what you are trying to do can be achieved using keras.backend layers.
Though be noted that when using lower level layers you will lose some higher level abstraction. E.g when using keras.backend.conv1d you need to have input shape of (BATCH_SIZE,width, channels) and kernel with shape of (kernel_size,input_channels,output_channels). So in your case let as assume the x has channels of 1(input channels ==1) and y also have the same number of channels(output channels == 1).
So your code now can be refactored as follows
from keras import backend as K
def convolve1d(x,kernel):
y = K.conv1d(x,kernel, padding='valid', strides=1,data_format="channels_last")
return y
input_channels = 1
output_channels = 1
kernel_width = 5
input_width = 100
ker = K.variable(K.random_uniform([kernel_width,input_channels,output_channels]),K.floatx())
x = Input(shape=(input_width,input_channels)
y = convolve1d(x,ker)
I guess I have understood what you mean. Given the wrong example code below:
input_signal = Input(shape=(L), name='input_signal')
input_h = Input(shape=(N), name='input_h')
faded= Lambda(lambda x: tf.nn.conv1d(input, x))(input_h)
You want to convolute each signal vector with different fading coefficients vector.
The 'conv' operation in TensorFlow, etc. tf.nn.conv1d, only support a fixed value kernel. Therefore, the code above can not run as you want.
I have no idea, too. The code you given can run normally, however, it is too complex and not efficient. In my idea, another feasible but also inefficient way is to multiply with the Toeplitz matrix whose row vector is the shifted fading coefficients vector. When the signal vector is too long, the matrix will be extremely large.

How to build a recurrent neural net in Keras where each input goes through a layer first?

I'm trying to build an neural net in Keras that would look like this:
Where x_1, x_2, ... are input vectors that undergo the same transformation f. f is itself a layer whose parameters must be learned. The sequence length n is variable across instances.
I'm having trouble understanding two things here:
What should the input look like?
I'm thinking of a 2D tensor with shape (number_of_x_inputs, x_dimension), where x_dimension is the length of a single vector $x$. Can such 2D tensor have a variable shape? I know tensors can have variable shapes for batch processing, but I don't know if that helps me here.
How do I pass each input vector through the same transformation before feeding it to the RNN layer?
Is there a way to sort of extend for example a GRU so that an f layer is added before going through the actual GRU cell?
I'm not an expert, but I hope this helps.
Question 1:
Vectors x1, x2... xn can have different shapes, but I'm not sure if the instances of x1 can have different shapes. When I have different shapes I usually pad the short sequences with 0s.
Question 2:
I'm not sure about extending a GRU, but I would do something like this:
x_dims = [50, 40, 30, 20, 10]
n = 5
def network():
shared_f = Conv1D(5, 3, activation='relu')
shated_LSTM = LSTM(10)
inputs = []
to_concat = []
for i in range(n):
x_i = Input(shape=(x_dims[i], 1), name='x_' + str(i))
inputs.append(x_i)
step1 = shared_f(x_i)
to_concat.append(shated_LSTM(step1))
merged = concatenate(to_concat)
final = Dense(2, activation='softmax')(merged)
model = Model(inputs=inputs, outputs=[final])
# model = Model(inputs=[sequence], outputs=[part1])
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
return model
m = network()
In this example, I used a Conv1D as the shared f transformation, but you could use something else (Embedding, etc.).

How to use multiple parameters with fittype in Matlab

I have a 1000x2 data file that I'm using for this problem.
I am supposed to fit the data with Acos(wt + phi). t is time, which is the first column in the data file, i.e. the independent variable. I need to find the fit parameters (A, f, and phi) and their uncertainties.
My code is as follows:
%load initial data file
data = load('hw_fit_cos_problem.dat');
t = data(:,1); %1st column is t (time)
x = t;
y = data(:,2); %2nd column is y (signal strength)
%define fitting function
f = fittype('A*cos(w*x + p)','coefficients','A','problem',{'w','p'});
% check fit parameters
coeffs = coeffnames(f);
%fit data
[A] = fit(x,y,f)
disp('confidence interval/errorbars');
ci = confint(A)
which yields 4 different error messages that I don't understand.
Error Messages:
Error using fit>iAssertNumProblemParameters (line 1113)
Missing problem parameters. Specify the values as a cell array with one element for each problem parameter
in the fittype.
Error in fit>iFit (line 198)
iAssertNumProblemParameters( probparams, probnames( model ) );
Error in fit (line 109)
[fitobj, goodness, output, convmsg] = iFit( xdatain, ydatain, fittypeobj, ...
Error in problem2 (line 14)
[A] = fit(x,y,f)
The line of code
f = fittype('A*cos(w*x + p)','coefficients','A','problem',{'w','p'});
specifies A as a "coefficient" in the model, and the values w and p as "problem" parameters.
Thus, the fitting toolbox expects that you will provide some more information about w and p, and then it will vary A. When no further information about w and p was provided to the fitting tool, that resulted in an error.
I am not sure of the goal of this project, or why w and p were designated as problem parameters. However, one simple solution is to allow the toolbox to treat A, w, and p as "coefficients", as follows:
f = fittype('A*cos(w*x + p)','coefficients', {'A', 'w', 'p'});
In this case, the code will not throw an error, and will return 95% confidence intervals on A, w, and p.
I hope that helps.
The straightforward answer to your question is that the error "Missing problem parameters" is generated because you have identified w and p as problem-specific fixed parameters,
but you have not told the fit function what these fixed values are.
You can do this by changing the line
[A] = fit(x,y,f)
to
[A]=fit(x,y,f,'problem',{100,0.1})
which supplies the values w=100 and p=0.1 in the fit. This should resolve the errors you specified (all 4 error messages result from this error)
In general specifying some of the quantities in your fit equation as problem-specific fixed parameters might be a valid thing to do - for example if you have determined them independently and have good reason to believe the values you obtained to be reliable. In this case, you might know the frequency w in this way, but you most probably won't know the phase p, so that should be a fit coefficient.
Hope that helps.