Convert pointcloud csv to hdf5 to train on PointCNN network - classification

I am trying to train my point cloud data on PointCNN so I need to convert my dataset to hdf5 as used in PointCNN. PointCNN used the modelnet40_ply_hdf5_2048 dataset.
I have tried converting my custom dataset but I am having issues with the label.
I tried this to get the label/shape_names
shape_ids = {}
shape_ids = [line.rstrip() for line in open(os.path.join(PATH, 'filelist1.txt'))]
shape_names = ['_'.join(x.split('_')[0:-1]) for x in shape_ids]
datapath = [(shape_names[i], os.path.join(PATH, shape_names[i], shape_ids[i])) for i
in range(len(shape_ids))]
Convert to h5py file
import numpy as np
from tqdm import tqdm
import h5py
filenames = [line.rstrip() for line in open(os.path.join(PATH))]
f = h5py.File("filename", 'w')
data = np.zeros((len(filenames), 1024, 3))
for i in range(0, len(datapath)):
fn = datapath[i]
cls = classes[datapath[i][0]]
label = np.array([cls]).astype(np.int32)
csvreader = np.genfromtxt("data1/" + filenames[i] + ".csv", delimiter=",").astype(np.float32)
for j in range(0,1024):
data[i,j] = [csvreader[j][0], csvreader[j][1], csvreader[j][2]]
label
dset1 = f.create_dataset("data", data=data, compression="gzip", compression_opts=4)
dset2 = f.create_dataset("label", data=label, compression="gzip", compression_opts=1)
f.close()
It did convert successfully but when I tried to train on PointCNN
PointCNN training
------Building model-------
------Successfully Built model-------
Traceback (most recent call last):
File "train_pytorch.py", line 174, in <module>
current_data, current_label, _ = provider.shuffle_data(current_data, np.squeeze(current_label))
File "provider.py", line 28, in shuffle_data
idx = np.arange(len(labels))
TypeError: len() of unsized object

Related

Getting 'OSError: -2' while converting a tif image into jpg image using python

I'm trying to convert tiff images into jpg format and use it later in opencv. It is working fine in my local system but when I am executing it over linux server which is not connected to internet it is getting failed while saving the Image object as jpg format.
I'm using python3.8 and had installed all the libraries and its dependencies using wheel files over server using pip.
Here is the piece of code:
import PIL
import cv2
def face_detect(sourceImagepath1, processedFileName, imagename, pdfname):
temp_path = TEMP_PATH
processed_path = PROCESSED_PATH
misc_path = MISC_PATH
# cascade file path1
cascpath1 = misc_path + 'frontalface_cascade.xml'
# Create harr cascade
faceCascade = cv2.CascadeClassifier(cascpath1)
# Read image with PIL
image_pil = Image.open(sourceImagepath1)
# Save image in jpg format
image_pil.save(temp_path + processedFileName + '.jpg')
# Read image with opencv
image_cv = cv2.imread(temp_path + processedFileName + '.jpg')
# Convert image into grayscale
image_gray = cv2.cvtColor(image_cv, cv2.COLOR_BGR2GRAY)
# Detect faces in the image
face = faceCascade.detectMultiScale(
image_gray,
scaleFactor=1.3,
minNeighbors=5,
minSize=(30, 30)
# flags = cv2.CASCADE_SCALE_IMAGE
)
if len(face) > 0:
# Coordinates based on auto-face detection
x, y, w, h = face[0][0], face[0][1], face[0][2], face[0][3]
crop_image = image_pil.crop([x - 20, y - 30, x + w + 40, y + h + 60])
crop_image.save(processed_path + imagename)
# Save tif file as pdf
image_pil.save(processed_path + pdfname, save_all=True)
# Close image object
image_pil.close()
return len(face)
Here TEMP_PATH,PROCESSED_PATH,MISC_PATH are global variables of syntax like '/Users/user/Documents/Temp/'. I'm getting error on line:
image_pil.save(temp_path + processedFileName + '.jpg')
Below is the error i'm getting when executing the file
Traceback (most recent call last):
File "*path_from_root_directory*/PYTHON_SCRIPTS/Script/staging.py", line 363, in <module>
auto_face_count = face_detect(sourceImagepath1, processedFileName, imagename, pdfname)
File "*path_from_root_directory*/PYTHON_SCRIPTS/Script/staging.py", line 71, in greyScaleCheck
image_pil.save(temp_path + processedFileName + '.jpg')
File "*path_from_root_directory*/python3.8/site-packages/PIL/Image.py", line 2201, in save
self._ensure_mutable()
File "*path_from_root_directory*/python3.8/site-packages/PIL/Image.py", line 624, in _ensure_mutable
self._copy()
File "*path_from_root_directory*/python3.8/site-packages/PIL/Image.py", line 617, in _copy
self.load()
File "*path_from_root_directory*/python3.8/site-packages/PIL/TiffImagePlugin.py", line 1122, in load
return self._load_libtiff()
File "*path_from_root_directory*/python3.8/site-packages/PIL/TiffImagePlugin.py", line 1226, in _load_libtiff
raise OSError(err)
OSError: -2
I have provided full privileges to python directory and its sub-directories/files. Anyone have any idea why I'm getting this error ?

When I add workers to neutral network I get an error, pytorch

I have checked a lot of post and none of them seem to work for me. But when I try to add workers to the dataloader in pytorch it just feeds me an error back. I have tired reading it and figuring it out but I can't seem to find a solution. I assume there is something I'm supposed to add to make the workers able to do their job.
I have 64GB of ram, i9-9900k, and a 3080ti. So I don't think its a memory error is it?
I included the error code with 1 worker and 4 workers because they seem to be different.
Also it works with zero workers.
here is the error with 4 workers:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="__mp_main__")
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Traceback (most recent call last):
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 990, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\queue.py", line 172, in get raise Empty
queue.Empty
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:/Users/14055/Desktop/Class 1 Project/Chegg.py", line 202, in <module>
training()
File "c:/Users/14055/Desktop/Class 1 Project/Chegg.py", line 122, in training
for data, target in load_data.train_loader:
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 521, in __next__ data = self._next_data()
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 1186, in _next_data idx, data = self._get_data()
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 1142, in _get_data success, data = self._try_get_data()
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 1003, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 23204, 7668, 13636, 6132) exited unexpectedly
Error with 1 worker:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="__mp_main__")
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "c:\Users\14055\Desktop\Class 1 Project\Chegg.py", line 202, in <module>
training()
File "c:\Users\14055\Desktop\Class 1 Project\Chegg.py", line 122, in training
for data, target in load_data.train_loader:
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 359, in __iter__
return self._get_iterator()
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 918, in __init__
w.start()
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Traceback (most recent call last):
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 990, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\queue.py", line 172, in get
raise Empty
queue.Empty
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:/Users/14055/Desktop/Class 1 Project/Chegg.py", line 202, in <module>
training()
File "c:/Users/14055/Desktop/Class 1 Project/Chegg.py", line 122, in training
for data, target in load_data.train_loader:
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 521, in __next__
data = self._next_data()
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 1186, in _next_data
idx, data = self._get_data()
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 1142, in _get_data
success, data = self._try_get_data()
File "C:\Users\14055\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 1003, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 3372) exited unexpectedly
Code:
from numpy import testing
import torch.cuda
import numpy as np
import time
import array as arr
import os
from datetime import date, datetime
from torchvision import datasets
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import torch.nn as nn
import torch.nn.functional as F
from torchsummary import summary
torch.cuda.set_device(0)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
def load_data():
num_workers = 1
load_data.batch_size = 20
transform = transforms.ToTensor()
train_data = datasets.MNIST(root='data', train=True, download=True, transform=transform)
load_data.train_loader = torch.utils.data.DataLoader(train_data,
batch_size=load_data.batch_size, num_workers=num_workers, pin_memory=True,
shuffle=True)
test_data = datasets.MNIST(root='data', train=False, download=True, transform=transform)
load_data.test_loader = torch.utils.data.DataLoader(test_data,
batch_size=load_data.batch_size, num_workers=num_workers, pin_memory=True,
shuffle=True)
def visualize():
dataiter = iter(load_data.train_loader)
visualize.images, labels = dataiter.next()
visualize.images = visualize.images.numpy()
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(load_data.batch_size):
ax = fig.add_subplot(2, load_data.batch_size/2, idx+1, xticks=[], yticks=[])
ax.imshow(np.squeeze(visualize.images[idx]), cmap='gray')
ax.set_title(str(labels[idx].item()))
#plt.show()
def fig_values():
img = np.squeeze(visualize.images[1])
fig = plt.figure(figsize = (12,12))
ax = fig.add_subplot(111)
ax.imshow(img, cmap='gray')
width, height = img.shape
thresh = img.max()/2.5
for x in range(width):
for y in range(height):
val = round(img[x][y],2) if img[x][y] !=0 else 0
ax.annotate(str(val), xy=(y,x),
horizontalalignment='center',
verticalalignment='center',
color='white' if img[x][y]<thresh else 'black')
#plt.show()
load_data()
#visualize()
#fig_values()
class NeuralNet(nn.Module):
def __init__(self, gpu = True):
super(NeuralNet, self ).__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=128, kernel_size=3, padding=1)
self.bn1 = nn.BatchNorm2d(num_features=128)
self.tns1 = nn.Conv2d(in_channels=128, out_channels=4, kernel_size=1, padding=1)
self.conv2 = nn.Conv2d(in_channels=4, out_channels=16, kernel_size=3, padding=1)
self.bn2 = nn.BatchNorm2d(num_features=16)
self.pool1 = nn.MaxPool2d(2,2)
self.conv3 = nn.Conv2d(in_channels=16, out_channels=16, kernel_size=3, padding=1)
self.bn3 = nn.BatchNorm2d(num_features=16)
self.conv4 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1)
self.bn4 = nn.BatchNorm2d(num_features=32)
self.pool2 = nn.MaxPool2d(2,2)
self.tns2 = nn.Conv2d(in_channels=32, out_channels=16, kernel_size=1, padding=1)
self.conv5 = nn.Conv2d(in_channels=16, out_channels=16, kernel_size=3, padding=1)
self.bn5 = nn.BatchNorm2d(num_features=16)
self.conv6 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1)
self.bn6 = nn.BatchNorm2d(num_features=32)
self.conv7 = nn.Conv2d(in_channels=32, out_channels=10, kernel_size=1, padding=1)
self.gpool = nn.AvgPool2d(kernel_size=7)
self.drop = nn.Dropout2d(0.1)
def forward(self, x):
x = self.tns1(self.drop(self.bn1(F.relu(self.conv1(x)))))
x = self.drop(self.bn2(F.relu(self.conv2(x))))
x = self.pool1(x)
x = self.drop(self.bn3(F.relu(self.conv3(x))))
x = self.drop(self.bn4(F.relu(self.conv4(x))))
x = self.tns2(self.pool2(x))
x = self.drop(self.bn5(F.relu(self.conv5(x))))
x = self.drop(self.bn6(F.relu(self.conv6(x))))
x = self.conv7(x)
x = self.gpool(x)
x = x.view(-1, 10)
return F.log_softmax(x).to(device)
#has antioverfit
def training():
model.to(device)
optimizer= torch.optim.SGD(model.parameters(), lr=0.003, weight_decay= 0.00005, momentum = .9, nesterov = True)
n_epochs = 20000
a = np.float64([9,9,9,9,9]) #antioverfit
testing_loss = 0.0
for epoch in range(n_epochs) :
if(testing_loss <= a[4]): # part of anti overfit
train_loss = 0.0
testing_loss = 0.0
model.train().to(device)
for data, target in load_data.train_loader:
optimizer.zero_grad()
data = data.to(device) #gpu
target = target.to(device) #gpu
output = model(data).to(device)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
train_loss += loss.item()*data.size(0)
train_loss = train_loss/len(load_data.train_loader.dataset)
print('Epoch: {} \tTraining Loss: {:.6f}'.format(epoch+1, train_loss))
model.eval().to(device) # Gets Validation loss
train_loss = 0.0
with torch.no_grad():
for data, target in load_data.test_loader:
data = data.to(device)
target = target.to(device)
output = model(data).to(device)
loss =F.nll_loss(output, target)
testing_loss += loss.item()*data.size(0)
testing_loss = testing_loss / len(load_data.test_loader.dataset)
print('Validation loss = ' , testing_loss)
a = np.insert(a,0,testing_loss) # part of anti overfit
a = np.delete(a,5)
print('Validation loss = ' , testing_loss)
def evalution():
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
model.eval().to(device)
for data, target in load_data.test_loader:
data = data.to(device)
target = target.to(device)
output = model(data).to(device)
loss =F.nll_loss(output, target)
test_loss += loss.item()*data.size(0)
_, pred = torch.max(output, 1)
correct = np.squeeze(pred.eq(target.data.view_as(pred))).to(device)
for i in range(load_data.batch_size):
try:
label = target.data[i]
class_correct[label] += correct[i].item()
class_total[label] += 1
except IndexError:
break
# calculate and print avg test loss
test_loss = test_loss/len(load_data.test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))
for i in range(10):
if class_total[i] > 0:
print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
str(i), 100 * class_correct[i] / class_total[i],
np.sum(class_correct[i]), np.sum(class_total[i])))
else:
print('Test Accuracy of %5s: N/A (no training examples)' )
print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (
100. * np.sum(class_correct) / np.sum(class_total),
np.sum(class_correct), np.sum(class_total)))
acc = (
100. * np.sum(class_correct) / np.sum(class_total),
np.sum(class_correct), np.sum(class_total))
name = f"model-{acc}.pt"
name2 = f"model-{acc}.pth"
save_path = os.path.join("models", name)
save_path2 = os.path.join("models", name2)
torch.save(model, save_path)
torch.save(model, save_path2)
model = NeuralNet().to(device)
summary(model, input_size=(1, 28, 28))
training()
evalution()

ValueError: Cannot feed value of shape (1, 2048, 2048, 1) for Tensor 'image_tensor:0', which has shape '(?, ?, ?, 3)'

Using TensorFlow I am trying to detect one object(png and grayscale image). I have trained and exported a model.ckpt successfully. Now I am trying to restore the saved model.ckpt for prediction. Here is the script:
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
if tf.__version__ != '1.4.0':
raise ImportError('Please upgrade your tensorflow installation to v1.4.0!')
# This is needed to display the images.
#matplotlib inline
# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")
from utils import label_map_util
from utils import visualization_utils as vis_util
MODEL_NAME = 'melon_graph'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('training', 'object_detection.pbtxt')
NUM_CLASSES = 1
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape((im_height, im_width, 1)).astype(np.float64)
# For the sake of simplicity we will use only 2 images:
# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = 'test_images'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'te_data{}.png'.format(i)) for i in range(1, 336) ]
# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
for image_path in TEST_IMAGE_PATHS:
image = Image.open(image_path)
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
image_np = load_image_into_numpy_array(image)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
# Actual detection.
(boxes, scores, classes, num) = sess.run([detection_boxes, detection_scores, detection_classes, num_detections], feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(image_np,np.squeeze(boxes),np.squeeze(classes).astype(np.float64), np.squeeze(scores), category_index, use_normalized_coordinates=True, line_thickness=5)
plt.figure(figsize=IMAGE_SIZE)
plt.imshow(image_np)
and this is the error
Traceback (most recent call last): File "cochlear_detection.py",
line 81, in
(boxes, scores, classes, num) = sess.run([detection_boxes, detection_scores, detection_classes, num_detections],
feed_dict={image_tensor: image_np_expanded}) File
"/anaconda/lib/python3.6/site-packages/tensorflow/python/client/session.py",
line 889, in run
run_metadata_ptr) File "/anaconda/lib/python3.6/site-packages/tensorflow/python/client/session.py",
line 1096, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (1, 2048, 2048, 1) for Tensor
'image_tensor:0', which has shape '(?, ?, ?, 3)'

Loading a pretrained model fails when multiple GPU was used for training

I have trained a network model and saved its weights and architecture via checkpoint = ModelCheckpoint(filepath='weights.hdf5') callback. During training, I am using multiple GPUs by calling the funtion below:
def make_parallel(model, gpu_count):
def get_slice(data, idx, parts):
shape = tf.shape(data)
size = tf.concat([ shape[:1] // parts, shape[1:] ],axis=0)
stride = tf.concat([ shape[:1] // parts, shape[1:]*0 ],axis=0)
start = stride * idx
return tf.slice(data, start, size)
outputs_all = []
for i in range(len(model.outputs)):
outputs_all.append([])
#Place a copy of the model on each GPU, each getting a slice of the batch
for i in range(gpu_count):
with tf.device('/gpu:%d' % i):
with tf.name_scope('tower_%d' % i) as scope:
inputs = []
#Slice each input into a piece for processing on this GPU
for x in model.inputs:
input_shape = tuple(x.get_shape().as_list())[1:]
slice_n = Lambda(get_slice, output_shape=input_shape, arguments={'idx':i,'parts':gpu_count})(x)
inputs.append(slice_n)
outputs = model(inputs)
if not isinstance(outputs, list):
outputs = [outputs]
#Save all the outputs for merging back together later
for l in range(len(outputs)):
outputs_all[l].append(outputs[l])
# merge outputs on CPU
with tf.device('/cpu:0'):
merged = []
for outputs in outputs_all:
merged.append(merge(outputs, mode='concat', concat_axis=0))
return Model(input=model.inputs, output=merged)
With the testing code:
from keras.models import Model, load_model
import numpy as np
import tensorflow as tf
model = load_model('cpm_log/deneme.hdf5')
x_test = np.random.randint(0, 255, (1, 368, 368, 3))
output = model.predict(x = x_test, batch_size=1)
print output[4].shape
I got the error below:
Traceback (most recent call last):
File "cpm_test.py", line 5, in <module>
model = load_model('cpm_log/Jun5_1000/deneme.hdf5')
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 240, in load_model
model = model_from_config(model_config, custom_objects=custom_objects)
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 301, in model_from_config
return layer_module.deserialize(config, custom_objects=custom_objects)
File "/usr/local/lib/python2.7/dist-packages/keras/layers/__init__.py", line 46, in deserialize
printable_module_name='layer')
File "/usr/local/lib/python2.7/dist-packages/keras/utils/generic_utils.py", line 140, in deserialize_keras_object
list(custom_objects.items())))
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2378, in from_config
process_layer(layer_data)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2373, in process_layer
layer(input_tensors[0], **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 578, in __call__
output = self.call(inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/layers/core.py", line 659, in call
return self.function(inputs, **arguments)
File "/home/muhammed/DEV_LIBS/developments/mocap/pose_estimation/training/cpm/multi_gpu.py", line 12, in get_slice
def get_slice(data, idx, parts):
NameError: global name 'tf' is not defined
By inspecting the error output, I decide that the problem is with the parallelization code. However, I can't resolve the issue.
You may need to use custom_objects to enable loading of the model.
import tensorflow as tf
model = load_model('model.h5', custom_objects={'tf': tf,})

Callbackfunction modelcheckpoint causes error in keras

I seem to get this error when I am using the callback function modelcheckpoint..
I read from a github issue that the solution would be make use of model.get_weight, but I am implicitly only storing that since i am only storing the one with best weight.
Keras only seem to save weights using h5, which make me question is there any other way to do store them using the eras API, if so how? If not, how do i store it?
Made an example to recreate the problem:
#!/usr/bin/python
import glob, os
import sys
from os import listdir
from os.path import isfile, join
import numpy as np
import warnings
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from keras.utils import np_utils
from keras import metrics
import keras
from keras import backend as K
from keras.models import Sequential
from keras.optimizers import SGD, Adam
from keras.layers.core import Dense, Activation, Lambda, Reshape,Flatten
from keras.layers import Conv1D,Conv2D,MaxPooling2D, MaxPooling1D, Reshape
#from keras.utils.visualize_util import plot
from keras.models import Model
from keras.layers import Input, Dense
from keras.layers.merge import Concatenate, Add
import h5py
import random
import tensorflow as tf
import math
from keras.callbacks import CSVLogger
from keras.callbacks import ModelCheckpoint
if len(sys.argv) < 5:
print "Missing Arguments!"
print "python keras_convolutional_feature_extraction.py <workspace> <totale_frames> <fbank-dim> <window-height> <batch_size>"
print "Example:"
print "python keras_convolutional_feature_extraction.py deltas 15 40 5 100"
sys.exit()
total_frames = int(sys.argv[2])
total_frames_with_deltas = total_frames*3
dim = int(sys.argv[3])
window_height = int(sys.argv[4])
inserted_batch_size = int(sys.argv[5])
stride = 1
splits = ((dim - window_height)+1)/stride
#input_train_data = "/media/carl/E2302E68302E443F/"+str(sys.argv[1])+"/fbank/org_train_total_frames_"+str(total_frames)+"_dim_"+str(dim)+"_winheig_"+str(window_height)+"_batch_"+str(inserted_batch_size)+"_fws_input"
#output_train_data ="/media/carl/E2302E68302E443F/"+str(sys.argv[1])+"/fbank/org_train_total_frames_"+str(total_frames)+"_dim_"+str(dim)+"_winheig_"+str(window_height)+"_batch_"+str(inserted_batch_size)+"_fws_output"
#input_test_data = "/media/carl/E2302E68302E443F/"+str(sys.argv[1])+"/fbank/org_test_total_frames_"+str(total_frames)+"_dim_"+str(dim)+"_winheig_"+str(window_height)+"_batch_"+str(1)+"_fws_input"
#output_test_data = "/media/carl/E2302E68302E443F/"+str(sys.argv[1])+"/fbank/org_test_total_frames_"+str(total_frames)+"_dim_"+str(dim)+"_winheig_"+str(window_height)+"_batch_"+str(1)+"_fws_output"
#train_files =[f for f in listdir(input_train_data) if isfile(join(input_train_data, f))]
#test_files =[f for f in listdir(input_test_data) if isfile(join(input_test_data, f))]
#print len(train_files)
np.random.seed(100)
print "hallo"
def train_generator():
while True:
# input = random.choice(train_files)
# h5f = h5py.File(input_train_data+'/'+input, 'r')
# train_input = h5f['train_input'][:]
# train_output = h5f['train_output'][:]
# h5f.close()
train_input = np.random.randint(100,size=((inserted_batch_size,splits*total_frames_with_deltas,window_height,3)))
train_list_list = []
train_input = train_input.reshape((inserted_batch_size,splits*total_frames_with_deltas,window_height,3))
train_input_list = np.split(train_input,splits*total_frames_with_deltas,axis=1)
for i in range(len(train_input_list)):
train_input_list[i] = train_input_list[i].reshape(inserted_batch_size,window_height,3)
#for i in range(len(train_input_list)):
# train_input_list[i] = train_input_list[i].reshape(inserted_batch_size,33,window_height,1,3)
train_output = np.random.randint(5, size = (1,total_frames,5))
middle = int(math.ceil(total_frames/2))
train_output = train_output[:,middle:middle+1,:].reshape((inserted_batch_size,1,5))
#print train_output.shape
#print len(train_input_list)
#print train_input_list[0].shape
yield (train_input_list, train_output)
print "hallo"
def test_generator():
while True:
# input = random.choice(test_files)
# h5f = h5py.File(input_test_data+'/'+input, 'r')
# test_input = h5f['test_input'][:]
# test_output = h5f['test_output'][:]
# h5f.close()
test_input = np.random.randint(100,size=((inserted_batch_size,splits*total_frames_with_deltas,window_height,3)))
test_input = test_input.reshape((inserted_batch_size,splits*total_frames_with_deltas,window_height,3))
test_input_list = np.split(test_input,splits*total_frames_with_deltas,axis=1)
#test_input_list = np.split(test_input,45,axis=3)
for i in range(len(test_input_list)):
test_input_list[i] = test_input_list[i].reshape(inserted_batch_size,window_height,3)
#for i in range(len(test_input_list)):
# test_input_list[i] = test_input_list[i].reshape(inserted_batch_size,33,window_height,1,3)
test_output = np.random.randint(5, size = (1,total_frames,5))
middle = int(math.ceil(total_frames/2))
test_output = test_output[:,middle:middle+1,:].reshape((inserted_batch_size,1,5))
yield (test_input_list, test_output)
print "hallo"
def fws():
#print "Inside"
# Params:
# batch , lr, decay , momentum, epochs
#
#Input shape: (batch_size,40,45,3)
#output shape: (1,15,50)
# number of unit in conv_feature_map = splitd
next(train_generator())
model_output = []
list_of_input = [Input(shape=(8,3)) for i in range(splits*total_frames_with_deltas)]
output = []
#Conv
skip = total_frames_with_deltas
for steps in range(total_frames_with_deltas):
conv = Conv1D(filters = 100, kernel_size = 8)
column = 0
for _ in range(splits):
#print "column " + str(column) + "steps: " + str(steps)
output.append(conv(list_of_input[(column*skip)+steps]))
column = column + 1
#print len(output)
#print splits*total_frames_with_deltas
conv = []
for section in range(splits):
column = 0
skip = splits
temp = []
for _ in range(total_frames_with_deltas):
temp.append(output[((column*skip)+section)])
column = column + 1
conv.append(Add()(temp))
#print len(conv)
output_conc = Concatenate()(conv)
#print output_conc.get_shape
output_conv = Reshape((splits, -1))(output_conc)
#print output_conv.get_shape
#Pool
pooled = MaxPooling1D(pool_size = 6, strides = 2)(output_conv)
reshape = Reshape((1,-1))(pooled)
#Fc
dense1 = Dense(units = 1024, activation = 'relu', name = "dense_1")(reshape)
#dense2 = Dense(units = 1024, activation = 'relu', name = "dense_2")(dense1)
dense3 = Dense(units = 1024, activation = 'relu', name = "dense_3")(dense1)
final = Dense(units = 5, activation = 'relu', name = "final")(dense3)
model = Model(inputs = list_of_input , outputs = final)
sgd = SGD(lr=0.1, decay=1e-1, momentum=0.9, nesterov=True)
model.compile(loss="categorical_crossentropy", optimizer=sgd , metrics = ['accuracy'])
print "compiled"
model_yaml = model.to_yaml()
with open("model.yaml", "w") as yaml_file:
yaml_file.write(model_yaml)
print "Model saved!"
log= CSVLogger('/home/carl/kaldi-trunk/dnn/experimental/yesno_cnn_50_training_total_frames_'+str(total_frames)+"_dim_"+str(dim)+"_window_height_"+str(window_height)+".csv")
filepath='yesno_cnn_50_training_total_frames_'+str(total_frames)+"_dim_"+str(dim)+"_window_height_"+str(window_height)+"weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_weights_only=True, mode='max')
print "log"
#plot_model(model, to_file='model.png')
print "Fit"
hist_current = model.fit_generator(train_generator(),
steps_per_epoch=444,#len(train_files),
epochs = 10000,
verbose = 1,
validation_data = test_generator(),
validation_steps=44,#len(test_files),
pickle_safe = True,
workers = 4,
callbacks = [log,checkpoint])
fws()
Execute the script by: python name_of_script.py yens 50 40 8 1
which give me a full traceback:
full traceback
Error:
carl#ca-ThinkPad-T420s:~/Dropbox$ python mini.py yesno 50 40 8 1
Using TensorFlow backend.
Couldn't import dot_parser, loading of dot files will not be possible.
hallo
hallo
hallo
compiled
Model saved!
log
Fit
/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py:2252: UserWarning: Expected no kwargs, you passed 1
kwargs passed to function are ignored with Tensorflow backend
warnings.warn('\n'.join(msg))
Epoch 1/10000
2017-05-26 13:01:45.851125: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-26 13:01:45.851345: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-26 13:01:45.851392: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
443/444 [============================>.] - ETA: 4s - loss: 100.1266 - acc: 0.3138Epoch 00000: saving model to yesno_cnn_50_training_total_frames_50_dim_40_window_height_8weights-improvement-00-0.48.hdf5
Traceback (most recent call last):
File "mini.py", line 205, in <module>
File "mini.py", line 203, in fws
File "/usr/local/lib/python2.7/dist-packages/keras/legacy/interfaces.py", line 88, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1933, in fit_generator
callbacks.on_epoch_end(epoch, epoch_logs)
File "/usr/local/lib/python2.7/dist-packages/keras/callbacks.py", line 77, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/usr/local/lib/python2.7/dist-packages/keras/callbacks.py", line 411, in on_epoch_end
self.model.save_weights(filepath, overwrite=True)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2503, in save_weights
save_weights_to_hdf5_group(f, self.layers)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2746, in save_weights_to_hdf5_group
f.attrs['layer_names'] = [layer.name.encode('utf8') for layer in layers]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2684)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2642)
File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/attrs.py", line 93, in __setitem__
self.create(name, data=value, dtype=base.guess_dtype(value))
File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/attrs.py", line 183, in create
attr = h5a.create(self._id, self._e(tempname), htype, space)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2684)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2642)
File "h5py/h5a.pyx", line 47, in h5py.h5a.create (/tmp/pip-4rPeHA-build/h5py/h5a.c:1904)
RuntimeError: Unable to create attribute (Object header message is too large)
If you look at the amount of data Keras is trying to save under layer_names attribute (inside the output HDF5 file being create), you will find that it takes more than 64kB.
np.asarray([layer.name.encode('utf8') for layer in model.layers]).nbytes
>> 77100
I quote from https://support.hdfgroup.org/HDF5/faq/limits.html:
Is there an object header limit and how does that affect HDF5 ?
There is a limit (in HDF5-1.8) of the object header, which is 64 KB.
The datatype for a dataset is stored in the object header, so there is
therefore a limit on the size of the datatype that you can have. (See
HDFFV-1089)
The code above was (almost entirely) copied from the traceback:
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2746, in save_weights_to_hdf5_group
f.attrs['layer_names'] = [layer.name.encode('utf8') for layer in layers]
I am using numpy asarray method to get the figure fast but h5py gets similar figure (I guess), see https://github.com/h5py/h5py/blob/master/h5py/_hl/attrs.py#L102 if you want to find exact figure.
Anyway, either you will need to implement your own methods for saving/loading of the weights (or use existing workarounds), or you need to give a really short name to ALL the layers inside your model :), something like this:
list_of_input = [Input(shape=(8,3), name=('i%x' % i)) for i in range(splits*total_frames_with_deltas)]
conv = Conv1D(filters = 100, kernel_size = 8, name='cv%x' % steps)
conv.append(Add(name='add%x' % section)(temp))
output_conc = Concatenate(name='ct')(conv)
output_conv = Reshape((splits, -1), name='rs1')(output_conc)
pooled = MaxPooling1D(pool_size = 6, strides = 2, name='pl')(output_conv)
reshape = Reshape((1,-1), name='rs2')(pooled)
dense1 = Dense(units = 1024, activation = 'relu', name = "d1")(reshape)
dense2 = Dense(units
= 1024, activation = 'relu', name = "d2")(dense1)
dense3 = Dense(units = 1024, activation = 'relu', name = "d3")(dense1)
final = Dense(units = 5, activation = 'relu', name = "fl")(dense3)
You mustn't forget to name all the layers because the (numpy) string array into which the layer names are converted is using the size of the longest string for each individual string in it when it is saved!
After renaming the layers as proposed above (which takes almost 26kB) the model is saved successfully. Hope this elaborate answer helps someone.
Update: I have just made a PR to Keras which should fix the issue without implementing any custom loading/saving methods, see 7508
A simple solution, albeit possibly not the most elegant, could be to run a while loop with epochs = 1.
Get the weights at the end of every epoch together with the accuracy and the loss
Save the weights to file 1 with model.get_weight
if accuracy is greater than at the previous epoch (i.e. loop), store the weights to a different file (file 2)
Run the loop again loading the weights from file 1
Break the loops setting a manual early stopping so that it breaks if the loss does not improve for a certain number of loops
You can use get_weights() together with numpy.save.
It's not the best solution, because it will save several files, but it actually works.
The problem is that you won't have the "optimizer" saved with the current states. But you can perhaps work around that by using smaller learning rates after loading.
Custom callback using numpy.save:
def myCallback(epoch,logs):
global storedLoss
#do your comparisons here using the "logs" var.
print(logs)
if (logs['loss'] < storedLoss):
storedLoss = logs['loss']
for i in range(len(model.layers)):
WandB = model.layers[i].get_weights()
if len (WandB) > 0: #necessary because some layers have no weights
np.save("W" + "-" + str(i), WandB[0],False)
np.save("B" + "-" + str(i), WandB[1],False)
#remember that get and set weights use a list: [weights,biases]
#it may happen (not sure) that there is no bias, and thus you may have to check it (len(WandB)==1).
The logs var brings a dictionary with named metrics, such as "loss", and "accuracy", if you used it.
You can store the losses within the callback in a global var, and compare if each loss is better or worse than the last.
When fitting, use the lambda callback:
from keras.callbacks import LambdaCallback
model.fit(...,callbacks=[LambdaCallback(on_epoch_end=myCallback)])
In the example above, I used the LambdaCallback, which has more possibilities than just on_epoch_end.
For loading, do a similar loop:
#you have to create the model first and then set the layers
def loadModel(model):
for i in range(len(model.layers)):
WandBForCheck = model.layers[i].get_weights()
if len (WandBForCheck) > 0: #necessary because some layers have no weights
W = np.load(Wfile + str(i))
B = np.load(Bfile + str(i))
model.layers[i].set_weights([W,B])
See follow-up at https://github.com/fchollet/keras/issues/6766 and https://github.com/farizrahman4u/keras-contrib/pull/90.
I saw the YAML and the root cause is probably that you have so many Inputs. A few Inputs with many dimensions is preferred to many Inputs, especially if you can use scanning and batch operations to do everything efficiently.
Now, ignoring that entirely, here is how you can save and load your model if it has too much stuff to save as JSON efficiently:
You can pass save_weights_only=True. That won't save optimizer weights, so isn't a great solution.
Just put together a PR for saving model weights and optimizer weights but not configuration. When you want to load, first instantiate and compile the model as you did when you were going to train it, then use load_all_weights to load the model and optimizer weights into that model. I'll try to merge it soon so you can use it from the master branch.
You could use it something like this:
from keras.callbacks import LambdaCallback
from keras_contrib.utils.save_load_utils import save_all_weights, load_all_weights
# do some stuff to create and compile model
# use `save_all_weights` as a callback to checkpoint your model and optimizer weights
model.fit(..., callbacks=[LambdaCallback(on_epoch_end=lambda epoch, logs: save_all_weights(model, "checkpoint-{:05d}.h5".format(epoch))])
# use `load_all_weights` to load model and optimizer weights into an existing model
# if not compiled (no `model.optimizer`), this will just load model weights
load_all_weights(model, 'checkpoint-1337.h5')
So I don't endorse the model, but if you want to get it to save and load anyways this should probably work for you.
As a side note, if you want to save weights in a different format, something like this would work.
pickle.dump([K.get_value(w) for w in model.weights], open( "save.p", "wb" ) )
Cheers
Your model architecture must be too large to be saved.
USE get_weights AND set_weights TO SAVE AND LOAD MODEL, RESPECTIVELY.
Do not use callback model checkpoint. just once the training ends, save its weights with pickle.
Have a look at this link: Unable to save DataFrame to HDF5 ("object header message is too large")