How do I actually execute a saved TensorFlow model? - neural-network

Tensorflow newbie here. I'm trying to build an RNN. My input data is a set of vector instances of size instance_size representing the (x,y) positions of a set of particles at each time step. (Since the instances already have semantic content, they do not require an embedding.) The goal is to learn to predict the positions of the particles at the next step.
Following the RNN tutorial and slightly adapting the included RNN code, I create a model more or less like this (omitting some details):
inputs, self._input_data = tf.placeholder(tf.float32, [batch_size, num_steps, instance_size])
self._targets = tf.placeholder(tf.float32, [batch_size, num_steps, instance_size])
with tf.variable_scope("lstm_cell", reuse=True):
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(hidden_size, forget_bias=0.0)
if is_training and config.keep_prob < 1:
lstm_cell = tf.nn.rnn_cell.DropoutWrapper(
lstm_cell, output_keep_prob=config.keep_prob)
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * config.num_layers)
self._initial_state = cell.zero_state(batch_size, tf.float32)
from tensorflow.models.rnn import rnn
inputs = [tf.squeeze(input_, [1])
for input_ in tf.split(1, num_steps, inputs)]
outputs, state = rnn.rnn(cell, inputs, initial_state=self._initial_state)
output = tf.reshape(tf.concat(1, outputs), [-1, hidden_size])
softmax_w = tf.get_variable("softmax_w", [hidden_size, instance_size])
softmax_b = tf.get_variable("softmax_b", [instance_size])
logits = tf.matmul(output, softmax_w) + softmax_b
loss = position_squared_error_loss(
tf.reshape(logits, [-1]),
tf.reshape(self._targets, [-1]),
)
self._cost = cost = tf.reduce_sum(loss) / batch_size
self._final_state = state
Then I create a saver = tf.train.Saver(), iterate over the data to train it using the given run_epoch() method, and write out the parameters with saver.save(). So far, so good.
But how do I actually use the trained model? The tutorial stops at this point. From the docs on tf.train.Saver.restore(), in order to read back in the variables, I need to either set up exactly the same graph I was running when I saved the variables out, or selectively restore particular variables. Either way, that means my new model will require inputs of size batch_size x num_steps x instance_size. However, all I want now is to do a single forward pass through the model on an input of size num_steps x instance_size and read out a single instance_size-sized result (the prediction for the next time step); in other words, I want to create a model that accepts a different-size tensor than the one I trained on. I can kludge it by passing the existing model my intended data batch_size times, but that doesn't seem like a best practice. What's the best way to do this?

You have to create a new graph that has the same structure but with the batch_size = 1 and import the saved variables with tf.train.Saver.restore(). You can take a look at how they define multiple models with variable batch size in ptb_word_lm.py: https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/rnn/ptb/ptb_word_lm.py
So you can have a separate file for instance, where you instantiate the graph with the batch_size that you want, then restore the saved variables. Then you can execute your graph.

Related

Problem understanding Loss function behavior using Flux.jl. in Julia

So. First of all, I am new to Neural Network (NN).
As part of my PhD, I am trying to solve some problem through NN.
For this, I have created a program that creates some data set made of
a collection of input vectors (each with 63 elements) and its corresponding
output vectors (each with 6 elements).
So, my program looks like this:
Nₜᵣ = 25; # number of inputs in the data set
xtrain, ytrain = dataset_generator(Nₜᵣ); # generates In/Out vectors: xtrain/ytrain
datatrain = zip(xtrain,ytrain); # ensamble my data
Now, both xtrain and ytrain are of type Array{Array{Float64,1},1}, meaning that
if (say)Nₜᵣ = 2, they look like:
julia> xtrain #same for ytrain
2-element Array{Array{Float64,1},1}:
[1.0, -0.062, -0.015, -1.0, 0.076, 0.19, -0.74, 0.057, 0.275, ....]
[0.39, -1.0, 0.12, -0.048, 0.476, 0.05, -0.086, 0.85, 0.292, ....]
The first 3 elements of each vector is normalized to unity (represents x,y,z coordinates), and the following 60 numbers are also normalized to unity and corresponds to some measurable attributes.
The program continues like:
layer1 = Dense(length(xtrain[1]),46,tanh); # setting 6 layers
layer2 = Dense(46,36,tanh) ;
layer3 = Dense(36,26,tanh) ;
layer4 = Dense(26,16,tanh) ;
layer5 = Dense(16,6,tanh) ;
layer6 = Dense(6,length(ytrain[1])) ;
m = Chain(layer1,layer2,layer3,layer4,layer5,layer6); # composing the layers
squaredCost(ym,y) = (1/2)*norm(y - ym).^2;
loss(x,y) = squaredCost(m(x),y); # define loss function
ps = Flux.params(m); # initializing mod.param.
opt = ADAM(0.01, (0.9, 0.8)); #
and finally:
trainmode!(m,true)
itermax = 700; # set max number of iterations
losses = [];
for iter in 1:itermax
Flux.train!(loss,ps,datatrain,opt);
push!(losses, sum(loss.(xtrain,ytrain)));
end
It runs perfectly, however, it comes to my attention that as I train my model with an increasing data set(Nₜᵣ = 10,15,25, etc...), the loss function seams to increase. See the image below:
Where: y1: Nₜᵣ=10, y2: Nₜᵣ=15, y3: Nₜᵣ=25.
So, my main question:
Why is this happening?. I can not see an explanation for this behavior. Is this somehow expected?
Remarks: Note that
All elements from the training data set (input and output) are normalized to [-1,1].
I have not tryed changing the activ. functions
I have not tryed changing the optimization method
Considerations: I need a training data set of near 10000 input vectors, and so I am expecting an even worse scenario...
Some personal thoughts:
Am I arranging my training dataset correctly?. Say, If every single data vector is made of 63 numbers, is it correctly to group them in an array? and then pile them into an ´´´Array{Array{Float64,1},1}´´´?. I have no experience using NN and flux. How can I made a data set of 10000 I/O vectors differently? Can this be the issue?. (I am very inclined to this)
Can this behavior be related to the chosen act. functions? (I am not inclined to this)
Can this behavior be related to the opt. algorithm? (I am not inclined to this)
Am I training my model wrong?. Is the iteration loop really iterations or are they epochs. I am struggling to put(differentiate) this concept of "epochs" and "iterations" into practice.
loss(x,y) = squaredCost(m(x),y); # define loss function
Your losses aren't normalized, so adding more data can only increase this cost function. However, the cost per data doesn't seem to be increasing. To get rid of this effect, you might want to use a normalized cost function by doing something like using the mean squared cost.

Fitting a sine wave with Keras and PYMC3 yields unexpected results

I've been trying to fit a sine curve with a keras (theano backend) model using pymc3. I've been using this [http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learning/] as a reference point.
A Keras implementation alone fit using optimization does a good job, however Hamiltonian Monte Carlo and Variational sampling from pymc3 is not fitting the data. The trace is stuck at where the prior is initiated. When I move the prior the posterior moves to the same spot. The posterior predictive of the bayesian model in cell 59 is barely getting the sine wave, whereas the non-bayesian fit model gets it near perfect in cell 63. I created a notebook here: https://gist.github.com/tomc4yt/d2fb694247984b1f8e89cfd80aff8706 which shows the code and the results.
Here is a snippet of the model below...
class GaussWeights(object):
def __init__(self):
self.count = 0
def __call__(self, shape, name='w'):
return pm.Normal(
name, mu=0, sd=.1,
testval=np.random.normal(size=shape).astype(np.float32),
shape=shape)
def build_ann(x, y, init):
with pm.Model() as m:
i = Input(tensor=x, shape=x.get_value().shape[1:])
m = i
m = Dense(4, init=init, activation='tanh')(m)
m = Dense(1, init=init, activation='tanh')(m)
sigma = pm.Normal('sigma', 0, 1, transform=None)
out = pm.Normal('out',
m, 1,
observed=y, transform=None)
return out
with pm.Model() as neural_network:
likelihood = build_ann(input_var, target_var, GaussWeights())
# v_params = pm.variational.advi(
# n=300, learning_rate=.4
# )
# trace = pm.variational.sample_vp(v_params, draws=2000)
start = pm.find_MAP(fmin=scipy.optimize.fmin_powell)
step = pm.HamiltonianMC(scaling=start)
trace = pm.sample(1000, step, progressbar=True)
The model contains normal noise with a fixed std of 1:
out = pm.Normal('out', m, 1, observed=y)
but the dataset does not. It is only natural that the predictive posterior does not match the dataset, they were generated in a very different way. To make it more realistic you could add noise to your dataset, and then estimate sigma:
mu = pm.Deterministic('mu', m)
sigma = pm.HalfCauchy('sigma', beta=1)
pm.Normal('y', mu=mu, sd=sigma, observed=y)
What you are doing right now is similar to taking the output from the network and adding standard normal noise.
A couple of unrelated comments:
out is not the likelihood, it is just the dataset again.
If you use HamiltonianMC instead of NUTS, you need to set the step size and the integration time yourself. The defaults are not usually useful.
Seems like keras changed in 2.0 and this way of combining pymc3 and keras does not seem to work anymore.

Using Keras LSTM to predict a single example after using batch training

I have a network model that is trained using batch training. Once it is trained, I want to predict the output for a single example.
Here is my model code:
model = Sequential()
model.add(Dense(32, batch_input_shape=(5, 1, 1)))
model.add(LSTM(16, stateful=True))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
I have a sequence of single inputs to single outputs. I'm doing some test code to map characters to next characters (A->B, B->C, etc).
I create an input data of shape (15,1,1) and an output data of shape (15, 1) and call the function:
model.fit(x, y, nb_epoch=epochs, batch_size=5, shuffle=False, verbose=0)
The model trains, and now I want to take a single character and predict the next character (input A, it predicts B). I create an input of shape (1, 1, 1) and call:
pred = model.predict(x, batch_size=1, verbose=0)
This gives:
ValueError: Shape mismatch: x has 5 rows but z has 1 rows
I saw one solution was to add "dummy data" to your predict values, so the input shape for the prediction would be (5,1,1) with data [x 0 0 0 0] and you would just take the first element of the output as your value. However, this seems inefficient when dealing with larger batches.
I also tried to remove the batch size from the model creation, but I got the following message:
ValueError: If a RNN is stateful, a complete input_shape must be provided (including batch size).
Is there another way? Thanks for the help.
Currently (Keras v2.0.8) it takes a bit more effort to get predictions on single rows after training in batch.
Basically, the batch_size is fixed at training time, and has to be the same at prediction time.
The workaround right now is to take the weights from the trained model, and use those as the weights in a new model you've just created, which has a batch_size of 1.
The quick code for that is
model = create_model(batch_size=64)
mode.fit(X, y)
weights = model.get_weights()
single_item_model = create_model(batch_size=1)
single_item_model.set_weights(weights)
single_item_model.compile(compile_params)
Here's a blog post that goes into more depth:
https://machinelearningmastery.com/use-different-batch-sizes-training-predicting-python-keras/
I've used this approach in the past to have multiple models at prediction time- one that makes predictions on big batches, one that makes predictions on small batches, and one that makes predictions on single items. Since batch predictions are much more efficient, this gives us the flexibility to take in any number of prediction rows (not just a number that is evenly divisible by batch_size), while still getting predictions pretty rapidly.
#ClimbsRocks showed a nice workaround. I cannot provide a "correct" answer in sense of "this is how Keras intends it to be done", but I can share another workaround which might help somebody depending on the use-case.
In this workaround I use predict_on_batch(). This method allows to pass a single sample out of a batch without throwing an error. Unfortunately, it returns a vector in the shape the target has according to the training-settings. However, each sample in the target yields then the prediction for your single sample.
You can access it like this:
to_predict = #Some single sample that would be part of a batch (has to have the right shape)#
model.predict_on_batch(to_predict)[0].flatten() #Flatten is optional
The result of the prediction is exactly the same as if you would pass an entire batch to predict().
Here some cod-example.
The code is from my question which also deals with this issue (but in a sligthly different manner).
sequence_size = 5
number_of_features = 1
input = (sequence_size, number_of_features)
batch_size = 2
model = Sequential()
#Of course you can replace the Gated Recurrent Unit with a LSTM-layer
model.add(GRU(100, return_sequences=True, activation='relu', input_shape=input, batch_size=2, name="GRU"))
model.add(GRU(1, return_sequences=True, activation='relu', input_shape=input, batch_size=batch_size, name="GRU2"))
model.compile(optimizer='adam', loss='mse')
model.summary()
#Summary-output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
GRU (GRU) (2, 5, 100) 30600
_________________________________________________________________
GRU2 (GRU) (2, 5, 1) 306
=================================================================
Total params: 30,906
Trainable params: 30,906
Non-trainable params: 0
def generator(data, batch_size, sequence_size, num_features):
"""Simple generator"""
while True:
for i in range(len(data) - (sequence_size * batch_size + sequence_size) + 1):
start = i
end = i + (sequence_size * batch_size)
yield data[start : end].reshape(batch_size, sequence_size, num_features), \
data[end - ((sequence_size * batch_size) - sequence_size) : end + sequence_size].reshape(batch_size, sequence_size, num_features)
#Task: Predict the continuation of a linear range
data = np.arange(100)
hist = model.fit_generator(
generator=generator(data, batch_size, sequence_size, num_features),
steps_per_epoch=total_batches,
epochs=200,
shuffle=False
)
to_predict = np.asarray([[np.asarray([x]) for x in range(95,100,1)]]) #Only single element of a batch
correct = np.asarray([100,101,102,103,104])
print( model.predict_on_batch(to_predict)[0].flatten() )
#Output:
[ 99.92908 100.95854 102.32129 103.28584 104.20213 ]

SVM multiclassification with MATLAB R2015a

I try to use MATLAB R2015a classification toolbox for my 4 classes. I imported my dataset and selected a Gaussian kernel to train my classifier. This is my dataset:
my Data=[9.36 0;8.72 0;9.13 0;7.38 0;8.02 0;12.15 1;11.02 1;11.61 1;
12.31 1;15.23 1;52.92 2;54.49 2;48.82 2;52.00 2;49.79 2;22.46 3;30.38 3;
21.98 3;24.46 3;26.08 3];
Then I export it into my workspace to use it with my new test data, but when I want to use it in work space this error apears:
Variables have been created in the base workspace.
To use the exported classifier trainedClassifier to make predictions on new data, T, use
yfit = predict(trainedClassifier, T{:,trainedClassifier.PredictorNames})
If your new data contains any integer variables, then preprocess the data to doubles like this:
X = table2array(varfun(#double, T(:,trainedClassifier.PredictorNames)));
yfit = predict(trainedClassifier, X)
I don't understand what does it mean exactly and what is T and yfit?
How can I test my new data with this classifier?
The thing is that you are trying to predict the classes of the data stored in a cell. First import it as a table.
Home_>import_>file name_>import_>(here choose Table from the imported data part). Now you can use your predictor by providing this table name.
yfit= a vector of predicted class labels for predictor data in the table T.
T = Sample data, specified as a table. Each row of T corresponds to one observation, and each column corresponds to one predictor variable. Optionally, T can contain additional columns for the response variable and observation weights. T must contain all of the predictors used to train SVMModel. Multi-column variables and cell arrays other than cell arrays of strings are not allowed.
Test data: example
load newdataset
rng(1);
CVSVMModel = fitcsvm(X,Y,'Holdout',0.15,'ClassNames',{'classname1','classname2'},...
'Standardize',true);
CompactSVMModel = CVSVMModel.Trained{1}; % Extract trained, compact classifier
testInds = test(CVSVMModel.Partition); % Extract the test indices
XTest = X(testInds,:);
predict(CompactSVMModel,XTest);% test here

How can GridSearchCV be used for clustering (MeanShift or DBSCAN)?

I'm trying to cluster some text documents using scikit-learn. I'm trying out both DBSCAN and MeanShift and want to determine which hyperparameters (e.g. bandwidth for MeanShift and eps for DBSCAN) best work for the kind of data I'm using (news articles).
I have some testing data which consists of pre-labeled clusters. I have been trying to use scikit-learn's GridSearchCV but don't understand how (or if it can) be applied in this case, since it needs the test data to be split, but I want to run the evaluation on the entire dataset and compare the results to the pre-labeled data.
I have been trying to specify a scoring function which compares the estimator's labels to the true labels, but of course it doesn't work because only a sample of the data has been clustered, not all of it.
What's an appropriate approach here?
The following function for DBSCAN might help. I've written it to iterate over the hyperparameters eps and min_samples and included optional arguments for min and max clusters. As DBSCAN is unsupervised, I have not included an evaluation parameter.
def dbscan_grid_search(X_data, lst, clst_count, eps_space = 0.5,
min_samples_space = 5, min_clust = 0, max_clust = 10):
"""
Performs a hyperparameter grid search for DBSCAN.
Parameters:
* X_data = data used to fit the DBSCAN instance
* lst = a list to store the results of the grid search
* clst_count = a list to store the number of non-whitespace clusters
* eps_space = the range values for the eps parameter
* min_samples_space = the range values for the min_samples parameter
* min_clust = the minimum number of clusters required after each search iteration in order for a result to be appended to the lst
* max_clust = the maximum number of clusters required after each search iteration in order for a result to be appended to the lst
Example:
# Loading Libraries
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
import pandas as pd
# Loading iris dataset
iris = datasets.load_iris()
X = iris.data[:, :]
y = iris.target
# Scaling X data
dbscan_scaler = StandardScaler()
dbscan_scaler.fit(X)
dbscan_X_scaled = dbscan_scaler.transform(X)
# Setting empty lists in global environment
dbscan_clusters = []
cluster_count = []
# Inputting function parameters
dbscan_grid_search(X_data = dbscan_X_scaled,
lst = dbscan_clusters,
clst_count = cluster_count
eps_space = pd.np.arange(0.1, 5, 0.1),
min_samples_space = pd.np.arange(1, 50, 1),
min_clust = 3,
max_clust = 6)
"""
# Importing counter to count the amount of data in each cluster
from collections import Counter
# Starting a tally of total iterations
n_iterations = 0
# Looping over each combination of hyperparameters
for eps_val in eps_space:
for samples_val in min_samples_space:
dbscan_grid = DBSCAN(eps = eps_val,
min_samples = samples_val)
# fit_transform
clusters = dbscan_grid.fit_predict(X = X_data)
# Counting the amount of data in each cluster
cluster_count = Counter(clusters)
# Saving the number of clusters
n_clusters = sum(abs(pd.np.unique(clusters))) - 1
# Increasing the iteration tally with each run of the loop
n_iterations += 1
# Appending the lst each time n_clusters criteria is reached
if n_clusters >= min_clust and n_clusters <= max_clust:
dbscan_clusters.append([eps_val,
samples_val,
n_clusters])
clst_count.append(cluster_count)
# Printing grid search summary information
print(f"""Search Complete. \nYour list is now of length {len(lst)}. """)
print(f"""Hyperparameter combinations checked: {n_iterations}. \n""")
Have you considered implementing the search yourself?
It's not particularly hard to implement a for loop. Even if you want to optimize two parameters it's still fairly easy.
For both DBSCAN and MeanShift I do however advise to first understand your similarity measure. It makes more sense to choose the parameters based on an understanding of your measure instead of parameter optimization to match some labels (which has a high risk of overfitting).
In other words, at which distance are two articles supposed to be clustered?
If this distance varies too much from one data point to another, these algorithms will fail badly; and you may need to find a normalized distance function such that the actual similarity values are meaningful again. TF-IDF is standard on text, but mostly in a retrieval context. They may work much worse in a clustering context.
Also beware that MeanShift (similar to k-means) needs to recompute coordinates - on text data, this may yield undesired results; where the updated coordinates actually got worse, instead of better.