pycuda.driver.pagelocked_empty() returns empty list - pycuda

I have retrained a model with tensorflow v2 and I want it to run on a Jetson Nano GPU. For that I had to save the model from .h5 as .pb then to .onnx and then to .trt (for which I also had to make the conversion to onnx with opset 12).
Now when I can finally run the model, I am reusing an old code that used to work with the old .trt model but at locking a page:
import pycuda.driver as cuda
....
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
dtype = trt.nptype(engine.get_binding_dtype(binding))
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
This results in an error:
pycuda_driver.LogicError: cuMemAlloc failed: invalid argument
After further debugging it turns out cuda.pagelocked_empty(size, dtype) retuns [] at the output binding separable_conv2d_29 with size=0 and dtype=numpy.float32. With the running code, the size is >0 for both input and output bindings.

Related

Spark-nlp • combining 'sentiment' and 'emotion' models causes crash

I'm working in a Google Colab notebook and set up via
!wget http://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash
import nlu
a quick version check nlu.version() confirms 3.4.2
Several of the official tutorial notebooks (for ex.: XLNet) create a multi-model pipeline that includes both 'sentiment' and 'emotion'.
Direct copy of content from the notebook:
import pandas as pd
# Download the dataset
!wget -N https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/sarcasm/train-balanced-sarcasm.csv -P /tmp
# Load dataset to Pandas
df = pd.read_csv('/tmp/train-balanced-sarcasm.csv')
pipe = nlu.load('sentiment pos xlnet emotion')
df['text'] = df['comment']
max_rows = 200
predictions = pipe.predict(df.iloc[0:100][['comment','label']], output_level='token')
predictions
However, running a prediction on this pipe results in the following error:
sentimentdl_glove_imdb download started this may take some time.
Approximate size to download 8.7 MB
[OK!]
pos_anc download started this may take some time.
Approximate size to download 3.9 MB
[OK!]
xlnet_base_cased download started this may take some time.
Approximate size to download 417.5 MB
[OK!]
classifierdl_use_emotion download started this may take some time.
Approximate size to download 21.3 MB
[OK!]
glove_100d download started this may take some time.
Approximate size to download 145.3 MB
[OK!]
tfhub_use download started this may take some time.
Approximate size to download 923.7 MB
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]
---------------------------------------------------------------------------
IllegalArgumentException Traceback (most recent call last)
<ipython-input-1-9b2e4a06bf65> in <module>()
34
35 # NLU to gives us one row per embedded word by specifying the output level
---> 36 predictions = pipe.predict( df.iloc[0:5][['text','label']], output_level='token' )
37
38 display(predictions)
9 frames
/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py in raise_from(e)
IllegalArgumentException: requirement failed: Wrong or missing inputCols annotators in SentimentDLModel_6c1a68f3f2c7.
Current inputCols: sentence_embeddings#glove_100d. Dataset's columns:
(column_name=text,is_nlp_annotator=false)
(column_name=document,is_nlp_annotator=true,type=document)
(column_name=sentence,is_nlp_annotator=true,type=document)
(column_name=sentence_embeddings#tfhub_use,is_nlp_annotator=true,type=sentence_embeddings).
Make sure such annotators exist in your pipeline, with the right output names and that they have following annotator types: sentence_embeddings
Having experimented with various combinations of models, it turns out that the problem is caused whenever 'sentiment' and 'emotion' models are specified in the same pipeline (regardless of pipeline order or what other models are listed).
Running pipe = nlu.load('emotion ANY OTHER MODELS') or pipe = nlu.load('sentiment ANY OTHER MODELS') will be successful, so it really appears to be only a result of combining 'sentiment' and 'emotion'
Is this a known bug? Does anyone have any suggestions for fixing?
My temporary solution has been to run emoPipe = nlu.load('emotion').predict() in isolation, then inner join the resulting dataframe to the the resulting df of pipe = nlu.load('sentiment pos xlnet').predict().
However, I would like to understand better what is failing and to know if there is a way to streamline the inclusion of all models.
Thanks

tensorflow/lite/core/subgraph.cc BytesRequired number of elements overflowed. Node number 1 (CONV_2D) failed to prepare. tflite

I am trying to convert a CNN model into tflite model. I converted it successfully, but this error happens when I try to load and run the model.
I am building a flutter app.
It initializes the Tensorflow Lite runtime but then raises this error.
I/tflite (27856): Initialized TensorFlow Lite runtime.
E/flutter (27856): [ERROR:flutter/lib/ui/ui_dart_state.cc(166)] Unhandled Exception: PlatformException(Failed to load model, Internal error: Unexpected failure when preparing tensor allocations: tensorflow/lite/core/subgraph.cc BytesRequired number of elements overflowed.
E/flutter (27856):
E/flutter (27856): Node number 1 (CONV_2D) failed to prepare.
I think I have figured out the problem.
After spending days trying to solve this problem. I found out that the model I was using to convert was an ImagNet pretrained model which is InceptionV3. The problem is may be there are some layers could not converted.
I used the following and they worked perfectly fine.
MobileNet and MobileNetV2.
NasNet Mobile version.
OR if you are new to deep learning and don't want to train or skip the deep learning part you can use Teachable Machine then convert it easly.
I hope this could help you guys!! Thank you
I ran into the exact same issue the last few days. I tried to load and run a tflite model on Android. I finally figured out how to solve the problem.
I was creating my model using:
model = Xception(include_top=False)
The important part here is include_top=False, together with the default argument input_shape=None.
If you look at the source code of Xception, Inception, MobileNet, or whatever (that you can find here), you will see that at some point before creating the first layer they call
input_shape = imagenet_utils.obtain_input_shape(
input_shape,
default_size=<default_size>,
min_size=<min_size>,
data_format=backend.image_data_format(),
require_flatten=include_top,
weights=weights)
which is implemented here, with the most important part for us being:
if input_shape:
...
else:
if require_flatten:
input_shape = default_shape
else:
if data_format == 'channels_first':
input_shape = (3, None, None)
else:
input_shape = (None, None, 3)
Thus, if I am not mistaken, when we set include_top to False, instead of getting the default shape we end up with undefined number of rows and columns. I am not sure how this is converted to tflite, although there is no error raised during conversion, but it really seems that Android cannot work with that (probably this is equivalent to setting an infinite image size). Hence this error when initializing the interpreter:
BytesRequired number of elements overflowed
When I set the proper input_shape argument in the constructor, i.e.
model = Xception(include_top=False, weights=None, input_shape=(rows, cols, channels))
then the converted model was working fine on Android.
As for why it is initializing correctly with MobileNetV2 in the same situation, i.e. by creating the model like so:
model = MobileNetV2(include_top=False)
I cannot explain...
Hope this brings an answer to your original question.
In fact, this is specified in the documentation, for instance in Xception:
input_shape: optional shape tuple, only to be specified
if `include_top` is False (otherwise the input shape
has to be `(299, 299, 3)`.
It should have exactly 3 inputs channels,
and width and height should be no smaller than 71.
E.g. `(150, 150, 3)` would be one valid value.
Whilst for MobileNetV2:
input_shape: Optional shape tuple, to be specified if you would
like to use a model with an input image resolution that is not
(224, 224, 3).
It should have exactly 3 inputs channels (224, 224, 3).
You can also omit this option if you would like
to infer input_shape from an input_tensor.
If you choose to include both input_tensor and input_shape then
input_shape will be used if they match, if the shapes
do not match then we will throw an error.
E.g. `(160, 160, 3)` would be one valid value.
Although it is not crystal clear.

Keras infinite loop

The code reads my images from colab folders. then it splits the codes as training set and validation set using generator. I used an existing premodel Dense201 to train it. However I am not sure why, for the the generator remains caught in an infinite loop and the loop that generates the validation data never executes. Does anyone know how to circumvent this ?
import tensorflow as tf
IMAGE_SIZE = 224
BATCH_SIZE = 64
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
validation_split=0.2)
train_generator = datagen.flow_from_directory(
base_dir,
target_size=(IMAGE_SIZE, IMAGE_SIZE),
batch_size=BATCH_SIZE,
subset='training')
val_generator = datagen.flow_from_directory(
base_dir,
target_size=(IMAGE_SIZE, IMAGE_SIZE),
batch_size=BATCH_SIZE,
subset='validation')
base_model = tf.keras.applications.DenseNet201(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
model = tf.keras.Sequential([
base_model,
tf.keras.layers.Conv2D(32, 3, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(5, activation='softmax')
])
model.compile(optimizer=tf.keras.optimizers.Adam(),
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(train_generator,
epochs=2,
steps_per_epoch=100,
validation_data=val_generator)
In the line:
history = model.fit(train_generator,
epochs=2,
steps_per_epoch=100,
validation_data=val_generator)
change steps_per_epoch=100 to steps_per_epoch=(len(train_generator)//BATCH_SIZE)
It finally worked!
!pip uninstall tensorflow
!pip install tensorflow==2.1.0
This issue arises because your validation generator is stuck in an infinite loop unable to exit. While data generator exits due to steps_per_epoch=100 argument you provided you haven't specified how many time the generator must be called until your validation loss is calculated. There's a similar argument that fixes this issue called validation_steps
history = model.fit(train_generator,
epochs=2,
steps_per_epoch=100,
validation_data=val_generator
validation_steps=50)
this way your validation loss will be calculated based on the data your validation generator returns for 50 calls, and it won't get stuck in an infinite loop

Write Matrix Data to Each Member of Datatype in HDF5 file via MATLAB

This is my first go at trying to create an HDF5 file from scratch using the Low-Level commands via MATLAB.
My issue is that I am having a hard time trying to write data to each specific member in the datatype on my dataset.
First, I create a new HDF5 file, and set the right layer of groups:
new_h5 = H5F.create('new_hdf5_file.h5','H5F_ACC_TRUNC','H5P_DEFAULT','H5P_DEFAULT');
new_h5 = H5G.create(new_h5,'first','H5P_DEFAULT','H5P_DEFAULT','H5P_DEFAULT');
new_h5 = H5G.create(new_h5,'second','H5P_DEFAULT','H5P_DEFAULT','H5P_DEFAULT');
Then, I create my datatype:
datatype = H5T.create('H5T_compound',20);
H5T.insert(datatype,'first_element',0,'H5T_NATIVE_INT');
H5T.insert(datatype,'second_element',4,'H5T_NATIVE_DOUBLE');
H5T.insert(datatype,'third_element',12,'H5T_NATIVE_DOUBLE');
Then, I format that into my dataset:
new_h5 = H5D.create(new_h5,'location',datatype,H5S.create('H5S_SCALAR'),'H5P_DEFAULT');
subset = H5D.get_type(H5D.open(new_h5,'/first/second/location'));
mem_type = H5T.get_member_type(subset,0);
I receive an error with the following command:
H5D.write(mem_type,'H5ML_DEFAULT','H5S_ALL','H5S_ALL','H5P_DEFAULT',data);
Error using hdf5lib2
Unhandled HDF5 class (H5T_NO_CLASS) encountered. It is not possible to write to this attribute or dataset.
So, I try this method instead:
new_h5 = H5D.create(new_h5,'location',datatype,H5S.create_simple(2,dims,dims),'H5P_DEFAULT'); %where dims are the dimensions of all matrices of data structure
H5D.write(mem_type,'H5ML_DEFAULT','H5S_ALL','H5S_ALL','H5P_DEFAULT',data); %where data is a structure
I receive an error with this following command:
H5D.write(mem_type,'H5ML_DEFAULT','H5S_ALL','H5S_ALL','H5P_DEFAULT',data);
Error using hdf5lib2
Attempted to transfer too many values to or from the library buffer.
When looking here for the XML tags for the error messages, it describes the above error as "illegalArrayAccess." Apparently, according to this question, you can only write to 4 members without the buffer throwing an error?
Is this correct? How can I correctly write to each member. I am about to reach my mental limit trying to figure this one out.
EDIT:
References kept here for general information:
HDF5 Compound Datatypes Example
HDF5 Compount Datatypes
H5D.write MATLAB Command
I found out why I cannot write data. I have solved the problem. I had my dimensions set incorrectly (which is code I forgot to include originally). My apologies. I had my dimensions like this:
dims = fliplr(size(data_matrix));
Where dims was a 15x250 matrix. The error was in that the buffer was unable to write a 250x15 matrix for each member, because it only had data for a 250x1 for each member.
The following code will (generically) work for writing data to each member:
new_h5 = H5F.create('new_hdf5_file.h5','H5F_ACC_TRUNC','H5P_DEFAULT','H5P_DEFAULT');
new_h5 = H5G.create(new_h5,'first','H5P_DEFAULT','H5P_DEFAULT','H5P_DEFAULT');
new_h5 = H5G.create(new_h5,'second','H5P_DEFAULT','H5P_DEFAULT','H5P_DEFAULT');
datatype = H5T.create('H5T_compound',20);
H5T.insert(datatype,'first_element',0,'H5T_NATIVE_INT');
H5T.insert(datatype,'second_element',4,'H5T_NATIVE_DOUBLE');
H5T.insert(datatype,'third_element',12,'H5T_NATIVE_DOUBLE');
dims = fliplr(size(data_matrix)); dims = [1 dims(1,2)];
new_h5 = H5D.create(new_h5,'location',datatype,H5S.create_simple(2,dims,dims),'H5P_DEFAULT');
H5D.write(new_h5,'H5ML_DEFAULT','H5S_ALL','H5S_ALL','H5P_DEFAULT',data_structure);
where data_matrix is a 15x250 matrix containing all data, and where data_structure is a sctucture containing 15 fields, each 250x1 in size.

Tensorflow 0.8 Import and Export output tensors problems

I am using Tensorflow 0.8 with Python 3. I am trying to train the Neural Network, and the goal is to automatically export/import network states every 50 iteration. The problem is when I export the output tensor at the first iteration, the output tensor name is ['Neg:0', 'Slice:0'], but when I export the output tensor at the second iteration, the output tensor name is changed as ['import/Neg:0', 'import/Slice:0'], and importing this output tensor is not working then:
ValueError: Specified colocation to an op that does not exist during import: import/Variable in import/Variable/read
I wonder if anyone has ideas on this problem. Thanks!!!
That's how tf.import_graph_def works.
If you don't want the prefix, just set the name parameter to the empty string as showed in the following example.
# import the model into the current graph
with tf.Graph().as_default() as graph:
const_graph_def = tf.GraphDef()
with open(TRAINED_MODEL_FILENAME, 'rb') as saved_graph:
const_graph_def.ParseFromString(saved_graph.read())
# replace current graph with the saved graph def (and content)
# name="" is important because otherwise (with name=None)
# the graph definitions will be prefixed with import.
# eg: the defined operation FC2/unscaled_logits:0
# will be import/FC2/unscaled_logits:0
tf.import_graph_def(const_graph_def, name="")
[...]