Hi I am trying to make following scheme of neural net using either pytorch or keras but it i don't know how to do it, can any one help.
scheme:
Scheme
Here goes a Keras Implementation using the functional API
from keras.models import Model
from keras.layers import Dense, Input, concatenate
def createModel( inp_1_shape, inp_2_shape):
first_input = Input(shape = (inp_1_shape,))
first_dense = Dense(1, )(first_input)
second_input = Input(shape = (inp_2_shape,))
second_dense = Dense(1, )(second_input)
merge = concatenate([first_dense, second_dense])
merge = Dense(2, )(merge)
merge = Dense(3, )(merge)
merge = Dense(1, )(merge)
model = Model(inputs=[first_input, second_input], outputs=merge)
model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
return model
Just call this function and it will return a keras Model, you might need to double check the no of neurons in each layer, but other than that, you will be fine.
Happy Training
Related
I'm trying to implement a very simple one layered MLP for a toy regression problem with one variable (dimension = 1) and one target (dimension = 1). It's a simple curve fitting problem with zero noise.
Matlab\Deep Learning Toolbox
Using levenberg-marquardt backpropagation on a MLP with a single hidden layer with 100 neurons and hyperbolic tangent activation I got pretty decent performance with almost zero effort:
MSE = 7.18e-08
Plotting the predictions and the targets I get a very precise fitting.
Python\Tensorflow\Keras
With the same network settings I used in matlab there's almost no training. No matter how hard I try to tune the training parameters or switch the optimizer.
MSE = 0.12900154
In this case the plot of the predictions is a curve that is not even able to follow the oscillations of the target curve.
I can obtain something better using RELU activations for the hidden layer but we're still far:
MSE = 0.0582045
This is the code I used in Python:
# IMPORT LIBRARIES
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
# IMPORT DATASET FROM CSV FILE, SHUFFLE TRAINING SET
# AND MAKE NUMPY ARRAY FOR TRAINING (DATA ARE ALREADY NORMALIZED)
dataset_path = "C:/Users/Rob/Desktop/Learning1.csv"
Learning_Dataset = pd.read_csv(dataset_path
, comment='\t',sep=","
,skipinitialspace=False)
Learning_Dataset = Learning_Dataset.sample(frac = 1) # SHUFFLING
test_dataset_path = "C:/Users/Rob/Desktop/Test1.csv"
Test_Dataset = pd.read_csv(test_dataset_path
, comment='\t',sep=","
,skipinitialspace=False)
Learning_Target = Learning_Dataset.pop('Target')
Test_Target = Test_Dataset.pop('Target')
Learning_Dataset = np.array(Learning_Dataset,dtype = "float32")
Test_Dataset = np.array(Test_Dataset,dtype = "float32")
Learning_Target = np.array(Learning_Target,dtype = "float32")
Test_Target = np.array(Test_Target,dtype = "float32")
# DEFINE SIMPLE MLP MODEL
inputs = tf.keras.layers.Input(shape=(1,))
x = tf.keras.layers.Dense(100, activation='relu')(inputs)
y = tf.keras.layers.Dense(1)(x)
model = tf.keras.Model(inputs=inputs, outputs=y)
# TRAIN MODEL
opt = tf.keras.optimizers.RMSprop(learning_rate = 0.001,
rho = 0.9,
momentum = 0.0,
epsilon = 1e-07,
centered = False)
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=100)
model.compile(optimizer = opt,
loss = 'mse',
metrics = ['mse'])
model.fit(Learning_Dataset,
Learning_Target,
epochs=500,
validation_split = 0.2,
verbose=0,
callbacks=[early_stop],
shuffle = False,
batch_size = 100)
# INFERENCE AND CHECK ACCURACY
Predictions = model.predict(Test_Dataset)
Predictions = Predictions.reshape(10000)
print(np.square(np.subtract(Test_Target,Predictions)).mean()) # MSE
plt.plot(Test_Dataset,Test_Target,'o',Test_Dataset,Predictions,'o')
plt.legend(('Target','Model Prediction'))
plt.show()
What am i doing wrong?
Thanks
I have the following model:
sharedLSTM1 = LSTM((data.shape[1]), return_sequences=True)
sharedLSTM2 = LSTM(data.shape[1])
def createModel(dropoutRate=0.0, numNeurons=40, optimizer='adam'):
inputLayer = Input(shape=(timesteps, data.shape[1]))
sharedLSTM1Instance = sharedLSTM1(inputLayer)
sharedLSTM2Instance = sharedLSTM2(sharedLSTM1Instance)
dropoutLayer = Dropout(dropoutRate)(sharedLSTM2Instance)
denseLayer1 = Dense(numNeurons)(dropoutLayer)
denseLayer2 = Dense(numNeurons)(denseLayer1)
outputLayer = Dense(1, activation='sigmoid')(denseLayer2)
return (inputLayer, outputLayer)
inputLayer1, outputLayer1 = createModel()
inputLayer2, outputLayer2 = createModel()
model = Model(inputs=[inputLayer1, inputLayer2], outputs=[outputLayer1, outputLayer2])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
What will be the behaviour of model.fit([data1, data2], [labels1, labels2]) in this model. Will it alternatively train the two NNs for each epoch? Or will it completely train one network, and then train the other? Or maybe some other way?
It will train the only existing network at once.
You don't have two models, you have one model only. This model will be trained.
Data1 and Data2 will be fed simultaneously.
The loss function will be applied to both outputs, and both will backpropagate.
I'm making my first steps on keras and I'm trying to do binary classification on the cancer dataset available in scikit-learn
# load dataset
from sklearn import datasets
cancer = datasets.load_breast_cancer()
cancer.data
# dataset into pd.dataframe
import pandas as pd
donnee = pd.concat([pd.DataFrame(data = cancer.data, columns = cancer.feature_names),
pd.DataFrame(data = cancer.target, columns = ["target"])
], axis = 1)
# train/test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(donnee.loc[:, donnee.columns != "target"], donnee.target, test_size = 0.25, random_state = 1)
I'm trying to follow keras' tutorial here : https://keras.io/#getting-started-30-seconds-to-keras
The thing is, I always get the same loss value (6.1316862406430541), and the same accuracy (0.61538461830232527), because the predictions are always 1.
I'm not sure if it's because of a code error :
I don't know, maybe the shape of X_train is wrong ?
Or maybe I'm doing something wrong with epochs and/or batch_size.
Or if it's because of the network itself :
if I'm not mistaken, all 1 predictions is possible if there's no biases to the layers, and I don't know yet how they're initialized
But maybe it's something else, maybe 1 layer only is too few ? (if so, I wonder why keras' tutorial is 1 layer only...)
Here is my code, if you have any idea :
import keras
from keras.models import Sequential
model = Sequential()
from keras.layers import Dense
model.add(Dense(units=64, activation='relu', input_dim=30))
model.add(Dense(units=1, activation='sigmoid'))
model.summary()
model.compile(loss = keras.losses.binary_crossentropy,
optimizer = 'rmsprop',
metrics=['accuracy']
)
model.fit(X_train.as_matrix(), y_train.as_matrix().reshape(426, -1), epochs=5, batch_size=32)
loss_and_metrics = model.evaluate(X_test.as_matrix(), y_test.as_matrix(), batch_size=128)
loss_and_metrics
classes = model.predict(X_test.as_matrix(), batch_size=128)
classes
This is a very usual case. If you check the histogram of your data you will see that there are data points in your dataset which coordinates spans from 0 to 100. When you feed such data to neural network input to sigmoid might be so big that it will suffer from underflow. In order to scale data, you could use either MinMaxScaler or StandardScaler thanks to what you'll make your data to have a span suitable for neural network computations.
This is a snippet of my model:
W1 = create_base_network(latent_dim)
input_a = Input(shape=(1,latent_dim))
input_b = Input(shape=(1,latent_dim))
x_a = encoder(input_a)
x_b = encoder(input_b)
processed_a = W1(x_a)
processed_b = W1(x_b)
del1 = Lambda(Delta1, output_shape=Delta1_output_shape)([processed_a, processed_b])
model = Model(input=[input_a, input_b], output=del1)
# train
rms = RMSprop()
model.compile(loss='kappa_delta_loss', optimizer=rms)
Basically, the neural net is getting a (pre-trained) encoder representation of the two inputs and computing the difference in prediction values for the two inputs by passing through a MLP. This difference is Delta1 which is y_pred of the network. I want the loss function to be y_pred*y_true. However, when I do that, I get the error, 'Invalid objective: kappa_delta_loss'.
What am I doing wrong?
You almost answer the question yourself. Create your objective
function like ones in
https://github.com/fchollet/keras/blob/master/keras/objectives.py like
this,
import theano import theano.tensor as T
epsilon = 1.0e-9
def custom_objective(y_true, y_pred):
'''Just another crossentropy'''
y_pred = T.clip(y_pred, epsilon, 1.0 - epsilon)
y_pred /= y_pred.sum(axis=-1, keepdims=True)
cce = T.nnet.categorical_crossentropy(y_pred, y_true)
return cce and pass it to compile argument
model.compile(loss=custom_objective, optimizer='adadelta')
from https://github.com/fchollet/keras/issues/369
So you should create your custom loss function with two arguments, the first being the target and the second your prediction.
Assuming your output (y_pred) is a scalar, your custom objective could be
def custom objective(y_true,y_pred)
return K.dot(y_true,y_pred)
K for keras backend (more generic than the theano example)
I am trying to implement answer selection model in deep learning as shown below in keras based on this paper,
I understand implementing steps embedding, bi-LSTM and pooling in above flow.
But how do I implement the merge function to compute the cosine similarity and loss function in keras?
loss function is defined as,
L= max{0,M-cosine(q,a+)+cosine(q,a-)}
where,
M = constant margin
q = question
a+ = correct answer
a- = wrong answer
Update 1:
After going through few blogs, this is how I implemented.
#build model
input_question = Input(shape=(max_len, embedding_dim))
input_sentence = Input(shape=(max_len, embedding_dim))
question_lstm = Bidirectional(LSTM(64))
sentence_lstm = Bidirectional(LSTM(64))
encoded_question = question_lstm(input_question)
encoded_sentence = sentence_lstm(input_sentence)
cos_distance = merge([encoded_question, encoded_sentence], mode='cos', dot_axes=1)
cos_distance = Reshape((1,))(cos_distance)
cos_similarity = Lambda(lambda x: 1-x)(cos_distance)
predictions = Dense(1, activation='sigmoid')(cos_similarity)
model = Model([input_question, input_sentence], [predictions])
model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])
With above implementation, I am still not able to figure out how to implement hinge loss. Please help