Training siamese neural network on multiple GPUs in Torch: Share not supported for cunn's DataParallelTable - neural-network

I'm trying to speed up my network implemented in torch7 but I get an error when I try to use nn.DataParallelTable.
This is what I'm trying to do:
m1, m2 = createModel(8,48), createModel(8,48)
--8 # of GPUs, 48 hidden unit in the last layer
m2:share(m1,'weight', 'bias') ----THE ERROR IS HERE
prl = nn.ParallelTable()
prl:add(m1)
prl:add(m2)
prl:cuda()
mlp = nn.Sequential()
mlp:add(prl)
mlp:cuda()
crit = nn.CosineEmbeddingCriterion():cuda()
Where the functions are:
function createModel(nGPU,bot)
local features = nn.Concat(2)
local fb1 = nn.Sequential() -- branch 1
fb1:add(nn.SpatialConvolution(1,48,3,3,1,1,1,1))
fb1:add(nn.ReLU(true))
fb1:add(nn.SpatialConvolution(48,128,3,3,1,1,1,1))
fb1:add(nn.ReLU(true))
fb1:add(nn.SpatialConvolution(128,192,3,3,1,1,1,1))
fb1:add(nn.ReLU(true))
fb1:add(nn.SpatialConvolution(192,192,3,3,1,1,1,1))
fb1:add(nn.ReLU(true))
fb1:add(nn.SpatialConvolution(192,128,3,3,1,1,1,1))
fb1:add(nn.ReLU(true))
fb1:add(nn.SpatialMaxPooling(2,2,2,2))
view = 12
local fb2 = fb1:clone() -- branch 2
for k,v in ipairs(fb2:findModules('nn.SpatialConvolution')) do
v:reset() -- reset branch 2's weights
end
features:add(fb1) features:add(fb2) features:cuda()
--------------the error is at this line-----------
features = makeDataParallel(features, nGPU)
local classifier = nn.Sequential()
classifier:add(nn.View(256viewview))
classifier:add(nn.Dropout(0.5))
classifier:add(nn.Linear(256viewview, 4096))
classifier:add(nn.Dropout(0.5))
classifier:add(nn.Linear(4096, 4096))
classifier:add(nn.Tanh())
classifier:add(nn.Linear(4096, bot))
classifier:add(nn.Tanh())
classifier:cuda()
local model = nn.Sequential():add(features):add(classifier)
return model
end
and the other one is:
function makeDataParallel(model, nGPU)
if nGPU > 1 then
print('converting module to nn.DataParallelTable')
assert(nGPU <= cutorch.getDeviceCount(), 'number of GPUs less than nGPU specified')
local model_single = model
model = nn.DataParallelTable(1)
for i=1, nGPU do
cutorch.setDevice(i)
model:add(model_single:clone():cuda(), i)
end
end
cutorch.setDevice(1)
return model
end
The error I get is:
[C]: in function 'error'
...a/torch/install/share/lua/5.1/cunn/DataParallelTable.lua:337: in function 'share'
/home/andrea/torch/install/share/lua/5.1/nn/Container.lua:97: in function 'share'
main.lua:123: in main chunk
[C]: at 0x00406670
Do you possibly know where the error is? Sorry but I'm kinda new at this and I cannot find a way to figure it out. Of course I'm figuring out wrong the net structure. Thanks in advance.

Related

Talos --> TypeError: __init__() got an unexpected keyword argument 'grid_downsample'

I am trying to run a hyperparameters optimization with Talos. As I have a lot of parameters to test, I want to use a 'grid_downsample' argument that will select 30% of all possible hyperparameters combinations. However when I run my code I get: TypeError: __init__() got an unexpected keyword argument 'grid_downsample'
I tested the code below without the 'grid_downsample' option and with less hyperparameters.
#load data
data = pd.read_csv('data.txt', sep="\t", encoding = "latin1")
# split into input (X) and output (y) variables
Y = np.array(data['Y'])
data_bis = data.drop(['Y'], axis = 1)
X = np.array(data_bis)
p = {'activation':['relu'],
'optimizer': ['Nadam'],
'first_hidden_layer': [12],
'second_hidden_layer': [12],
'batch_size': [20],
'epochs': [10,20],
'dropout_rate':[0.0, 0.2]}
def dnn_model(x_train, y_train, x_val, y_val, params):
model = Sequential()
#input layer
model.add(Dense(params['first_hidden_layer'], input_shape=(1024,)))
model.add(Dropout(params['dropout_rate']))
model.add(Activation(params['activation']))
#hidden layer 2
model.add(Dense(params['second_hidden_layer']))
model.add(Dropout(params['dropout_rate']))
model.add(Activation(params['activation']))
# output layer with one node
model.add(Dense(1))
model.add(Activation(params['activation']))
# Compile model
model.compile(loss='binary_crossentropy', optimizer=params['optimizer'], metrics=['accuracy'])
out = model.fit(x_train, y_train,
batch_size=params['batch_size'],
epochs=params['epochs'],
validation_data=[x_val, y_val],
verbose=0)
return out, model
scan_object = ta.Scan(X, Y, model=dnn_model, params=p, experiment_name="test")
reporting = ta.Reporting(scan_object)
report = reporting.data
report.to_csv('./Random_search/dnn/report_talos.txt', sep = '\t')
This code works well. If I change the scan_object as the end to: scan_object = ta.Scan(X, Y, model=dnn_model, grid_downsample=0.3, params=p, experiment_name="test"), it gives me the error: TypeError: __init__() got an unexpected keyword argument 'grid_downsample' while I was expecting to have the same results format as a normal grid search but with less combinations. What am I missing? Did the name of the argument change? I'm using Talos 0.6.3 in a conda environment.
Thank you!
might be too late for you now but they've switched it to fraction_limit. It would give this for you
scan_object = ta.Scan(X, Y, model=dnn_model, params=p, experiment_name="test", fraction_limit = 0.1)
Sadly, the doc isn't well updated
Check out their examples on GitHub:
https://github.com/autonomio/talos/blob/master/examples/Hyperparameter%20Optimization%20with%20Keras%20for%20the%20Iris%20Prediction.ipynb

SqueezeNet Deep Compression

Do you guys know where or how to obtain the 0.47MB version of SqueezeNet ?
In other words, how to make the weights bitwidth to be 6 instead of 8 ?
I cannot find the modification spot in this SqueezeNet generation code.
In this following method, I got 0.77 MB Model! Lets assume we have a SqueezeNet_model. We can convert SqueezeNet to Tensorflow Lite Model.
converter = tf.lite.TFLiteConverter.from_keras_model(SqueezeNet_model)
open("SqueezeNet_model.tflite", "wb").write(tflite_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant_model = converter.convert()
Then, we can use POST quantization to decrease the size of model!
open("SqueezeNet_Quant_model.tflite", "wb").write(tflite_quant_model)
print("Quantized model in Mb:", os.path.getsize('SqueezeNet_Quant_model.tflite') / float(2**20)) // I got 0.77 MB model
Finally, we can test our model with:
# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="SqueezeNet_Quant_model.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Test model on some input data.
input_shape = input_details[0]['shape']
acc=0
for i in range(len(x_test)):
input_data = np.array(x_test[i].reshape(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
if(np.argmax(output_data) == np.argmax(y_test[i])):
acc+=1
acc = acc/len(x_test)
print(acc*100)

Tensorflow doesn't want to use GPU

I want to train "standford chatbot" from here https://github.com/chiphuyen/stanford-tensorflow-tutorials/tree/master/assignments/chatbot on GPU, but it doesn't use my GPU, but all need libraries (CuNN, CUDA, tensorflow-gpu etc.) are installed
I tried:
def train():
""" Train the bot """
test_buckets, data_buckets, train_buckets_scale = _get_buckets()
# in train mode, we need to create the backward path, so forwrad_only is False
model = ChatBotModel(False, config.BATCH_SIZE)
model.build_graph()
saver = tf.train.Saver(var_list=tf.trainable_variables())
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True,log_device_placement=True)) as sess:
print('Start training')
sess.run(tf.global_variables_initializer())
_check_restore_parameters(sess, saver)
iteration = model.global_step.eval()
total_loss = 0
while True:
skip_step = _get_skip_step(iteration)
bucket_id = _get_random_bucket(train_buckets_scale)
encoder_inputs, decoder_inputs, decoder_masks = data.get_batch(data_buckets[bucket_id],
bucket_id,
batch_size=config.BATCH_SIZE)
start = time.time()
_, step_loss, _ = run_step(sess, model, encoder_inputs, decoder_inputs, decoder_masks, bucket_id, False)
total_loss += step_loss
iteration += 1
if iteration % skip_step == 0:
print('Итерация {}: потеря {}, время {}'.format(iteration, total_loss/skip_step, time.time() - start))
start = time.time()
total_loss = 0
saver.save(sess, os.path.join(config.CPT_PATH, 'chatbot'), global_step=model.global_step)
if iteration % (10 * skip_step) == 0:
# Run evals on development set and print their loss
_eval_test_set(sess, model, test_buckets)
start = time.time()
sys.stdout.flush()
But It always show:
InvalidArgumentError (see above for traceback): Cannot assign a device to node 'save/Const': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Const: CPU
Identity: CPU
[[Node: save/Const = Constdtype=DT_STRING, value=Tensor, _device="/device:GPU:0"]]
Are there some configuration file for tensorflow where I can specify to use only GPU or some another way (i tried "with tf.device("/gpu:0"):" and device_count={'GPU': 1}) )
From your error:
Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
That means that the 'save/Const' operation cannot be forcefully assigned to a GPU via with tf.device(): because there is no GPU implementation for it. Remove the with tf.device(): part (or put that operation outside of it) and let TF decide where to put operations (it will prefer GPU over CPU anyhow)

Tensorflow: Cannot interpret feed_dict key as Tensor

I am trying to build a neural network model with one hidden layer (1024 nodes). The hidden layer is nothing but a relu unit. I am also processing the input data in batches of 128.
The inputs are images of size 28 * 28. In the following code I get the error in line
_, c = sess.run([optimizer, loss], feed_dict={x: batch_x, y: batch_y})
Error: TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("Placeholder_64:0", shape=(128, 784), dtype=float32) is not an element of this graph.
Here is the code I have written
#Initialize
batch_size = 128
layer1_input = 28 * 28
hidden_layer1 = 1024
num_labels = 10
num_steps = 3001
#Create neural network model
def create_model(inp, w, b):
layer1 = tf.add(tf.matmul(inp, w['w1']), b['b1'])
layer1 = tf.nn.relu(layer1)
layer2 = tf.matmul(layer1, w['w2']) + b['b2']
return layer2
#Initialize variables
x = tf.placeholder(tf.float32, shape=(batch_size, layer1_input))
y = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
w = {
'w1': tf.Variable(tf.random_normal([layer1_input, hidden_layer1])),
'w2': tf.Variable(tf.random_normal([hidden_layer1, num_labels]))
}
b = {
'b1': tf.Variable(tf.zeros([hidden_layer1])),
'b2': tf.Variable(tf.zeros([num_labels]))
}
init = tf.initialize_all_variables()
train_prediction = tf.nn.softmax(model)
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
model = create_model(x, w, b)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(model, y))
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
#Process
with tf.Session(graph=graph1) as sess:
tf.initialize_all_variables().run()
total_batch = int(train_dataset.shape[0] / batch_size)
for epoch in range(num_steps):
loss = 0
for i in range(total_batch):
batch_x, batch_y = train_dataset[epoch * batch_size:(epoch+1) * batch_size, :], train_labels[epoch * batch_size:(epoch+1) * batch_size,:]
_, c = sess.run([optimizer, loss], feed_dict={x: batch_x, y: batch_y})
loss = loss + c
loss = loss / total_batch
if epoch % 500 == 0:
print ("Epoch :", epoch, ". cost = {:.9f}".format(avg_cost))
print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
valid_prediction = tf.run(tf_valid_dataset, {x: tf_valid_dataset})
print("Validation accuracy: %.1f%%" % accuracy(valid_prediction.eval(), valid_labels))
test_prediction = tf.run(tf_test_dataset, {x: tf_test_dataset})
print("TEST accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))
This worked for me
from keras import backend as K
and after predicting my data i inserted this part of code
then i had again loaded the model.
K.clear_session()
i faced this problem in production server,
but in my pc it was running fine
...........
from keras import backend as K
#Before prediction
K.clear_session()
#After prediction
K.clear_session()
Variable x is not in the same graph as model, try to define all of these in the same graph scope. For example,
# define a graph
graph1 = tf.Graph()
with graph1.as_default():
# placeholder
x = tf.placeholder(...)
y = tf.placeholder(...)
# create model
model = create(x, w, b)
with tf.Session(graph=graph1) as sess:
# initialize all the variables
sess.run(init)
# then feed_dict
# ......
If you use django server, just runserver with --nothreading
for example:
python manage.py runserver --nothreading
I had the same issue with flask. adding --without-threads flag to flask run or threaded=False to app.run() fixed it
In my case, I was using loop while calling in CNN multiple times, I fixed my problem by doing the following:
# Declare this as global:
global graph
graph = tf.get_default_graph()
# Then just before you call in your model, use this
with graph.as_default():
# call you models here
Note: In my case too, the app ran fine for the first time and then gave the error above. Using the above fix solved the problem.
Hope that helps.
The error message TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("...", dtype=dtype) is not an element of this graph can also arise in case you run a session outside of the scope of its with statement. Consider:
with tf.Session() as sess:
sess.run(logits, feed_dict=feed_dict)
sess.run(logits, feed_dict=feed_dict)
If logits and feed_dict are defined properly, the first sess.run command will execute normally, but the second will raise the mentioned error.
You can also experience this while working on notebooks hosted on online learning platforms like Coursera. So, implementing following code could help get over with the issue.
Implement this at the topmost block of Notebook file:
from keras import backend as K
K.clear_session()
Similar to #javan-peymanfard and #hmadali-shafiee, I ran into this issue when loading the model in an API. I was using FastAPI with uvicorn. To fix the issue I just set the API function definitions to async similar to this:
#app.post('/endpoint_name')
async def endpoint_function():
# Do stuff here, including possibly (re)loading the model

Simpy: How can I represent failures in a train subway simulation?

New python user here and first post on this great website. I haven't been able to find an answer to my question so hopefully it is unique.
Using simpy I am trying to create a train subway/metro simulation with failures and repairs periodically built into the system. These failures happen to the train but also to signals on sections of track and on plaforms. I have read and applied the official Machine Shop example (which you can see resemblance of in the attached code) and have thus managed to model random failures and repairs to the train by interrupting its 'journey time'.
However I have not figured out how to model failures of signals on the routes which the trains follow. I am currently just specifying a time for a trip from A to B, which does get interrupted but only due to train failure.
Is it possible to define each trip as its own process i.e. a separate process for sections A_to_B and B_to_C, and separate platforms as pA, pB and pC. Each one with a single resource (to allow only one train on it at a time) and to incorporate random failures and repairs for these section and platform processes? I would also need to perhaps have several sections between two platforms, any of which could experience a failure.
Any help would be greatly appreciated.
Here's my code so far:
import random
import simpy
import numpy
RANDOM_SEED = 1234
T_MEAN_A = 240.0 # mean journey time
T_MEAN_EXPO_A = 1/T_MEAN_A # for exponential distribution
T_MEAN_B = 240.0 # mean journey time
T_MEAN_EXPO_B = 1/T_MEAN_B # for exponential distribution
DWELL_TIME = 30.0 # amount of time train sits at platform for passengers
DWELL_TIME_EXPO = 1/DWELL_TIME
MTTF = 3600.0 # mean time to failure (seconds)
TTF_MEAN = 1/MTTF # for exponential distribution
REPAIR_TIME = 240.0
REPAIR_TIME_EXPO = 1/REPAIR_TIME
NUM_TRAINS = 1
SIM_TIME_DAYS = 100
SIM_TIME = 3600 * 18 * SIM_TIME_DAYS
SIM_TIME_HOURS = SIM_TIME/3600
# Defining the times for processes
def A_B(): # returns processing time for journey A to B
return random.expovariate(T_MEAN_EXPO_A) + random.expovariate(DWELL_TIME_EXPO)
def B_C(): # returns processing time for journey B to C
return random.expovariate(T_MEAN_EXPO_B) + random.expovariate(DWELL_TIME_EXPO)
def time_to_failure(): # returns time until next failure
return random.expovariate(TTF_MEAN)
# Defining the train
class Train(object):
def __init__(self, env, name, repair):
self.env = env
self.name = name
self.trips_complete = 0
self.broken = False
# Start "travelling" and "break_train" processes for the train
self.process = env.process(self.running(repair))
env.process(self.break_train())
def running(self, repair):
while True:
# start trip A_B
done_in = A_B()
while done_in:
try:
# going on the trip
start = self.env.now
yield self.env.timeout(done_in)
done_in = 0 # Set to 0 to exit while loop
except simpy.Interrupt:
self.broken = True
done_in -= self.env.now - start # How much time left?
with repair.request(priority = 1) as req:
yield req
yield self.env.timeout(random.expovariate(REPAIR_TIME_EXPO))
self.broken = False
# Trip is finished
self.trips_complete += 1
# start trip B_C
done_in = B_C()
while done_in:
try:
# going on the trip
start = self.env.now
yield self.env.timeout(done_in)
done_in = 0 # Set to 0 to exit while loop
except simpy.Interrupt:
self.broken = True
done_in -= self.env.now - start # How much time left?
with repair.request(priority = 1) as req:
yield req
yield self.env.timeout(random.expovariate(REPAIR_TIME_EXPO))
self.broken = False
# Trip is finished
self.trips_complete += 1
# Defining the failure
def break_train(self):
while True:
yield self.env.timeout(time_to_failure())
if not self.broken:
# Only break the train if it is currently working
self.process.interrupt()
# Setup and start the simulation
print('Train trip simulator')
random.seed(RANDOM_SEED) # Helps with reproduction
# Create an environment and start setup process
env = simpy.Environment()
repair = simpy.PreemptiveResource(env, capacity = 1)
trains = [Train(env, 'Train %d' % i, repair)
for i in range(NUM_TRAINS)]
# Execute
env.run(until = SIM_TIME)
# Analysis
trips = []
print('Train trips after %s hours of simulation' % SIM_TIME_HOURS)
for train in trains:
print('%s completed %d trips.' % (train.name, train.trips_complete))
trips.append(train.trips_complete)
mean_trips = numpy.mean(trips)
std_trips = numpy.std(trips)
print "mean trips: %d" % mean_trips
print "standard deviation trips: %d" % std_trips
it looks like you are using Python 2, which is a bit unfortunate, because
Python 3.3 and above give you some more flexibility with Python generators. But
your problem should be solveable in Python 2 nonetheless.
you can use sub processes within in a process:
def sub(env):
print('I am a sub process')
yield env.timeout(1)
# return 23 # Only works in py3.3 and above
env.exit(23) # Workaround for older python versions
def main(env):
print('I am the main process')
retval = yield env.process(sub(env))
print('Sub returned', retval)
As you can see, you can use Process instances returned by Environment.process()
like normal events. You can even use return values in your sub proceses.
If you use Python 3.3 or newer, you don’t have to explicitly start a new
sub-process but can use sub() as a sub routine instead and just forward the
events it yields:
def sub(env):
print('I am a sub routine')
yield env.timeout(1)
return 23
def main(env):
print('I am the main process')
retval = yield from sub(env)
print('Sub returned', retval)
You may also be able to model signals as resources that may either be used
by failure process or by a train. If the failure process requests the signal
at first, the train has to wait in front of the signal until the failure
process releases the signal resource. If the train is aleady passing the
signal (and thus has the resource), the signal cannot break. I don’t think
that’s a problem be cause the train can’t stop anyway. If it should be
a problem, just use a PreemptiveResource.
I hope this helps. Please feel welcome to join our mailing list for more
discussions.