caffe `"Python"` layer not found? - neural-network

I have installed caffe, uncommenting
WITH_PYTHON_LAYER=1
in 'Makefile.config'
When I use a python data layer in my net.prototxt, it says
Unknown layer type: Python
To cross check it in python interface,
I tried
import caffe
from caffe import layers as L
L.Python()
this seems to work ,no error then.
Where is the problem?

You can find out what layer types caffe has in python simply by examining caffe.layer_types_list(). For example, if you actually have a "Python" layer, then
list(caffe.layer_type_list()).index('Python')
Should actually return an index for its name in the layer types list.
As for L.Python() - this caffe.NetSpec() interface is used to programatically write a net prototxt, and at the writing stage layer types are not checked. You can actually write whatever layer you want:
L.YouDontThinkTheyNameALayerLikeThis()
Is totally cool. Even converting it to prototxt:
print "{}".format(L.YouDontThinkTheyNameALayerLikeThis().to_proto())
Actually results with this:
layer {
name: "YouDontThinkTheyNameALayerLikeThis1"
type: "YouDontThinkTheyNameALayerLikeThis"
top: "YouDontThinkTheyNameALayerLikeThis1"
}
You'll get an error message once you try to run this "net" using caffe...

Related

Implementing TensorFlowjs model in Ionic framework with Capacitor

I am working on a project where I am using Ionic and TensorFlow for machine learning. I have converted my TensorFlow model to a tensorflowjs model. I have put the model.json file and shard files of the tensorflowjs model in the assets folder in Ionic. Basically, I have put my tensorflowjs model in the assets folder of ionic. I am wanting to use Capacitor to access the camera and allow users to take photos. Then, the photos will be passed to the tensorflowjs model in assets to get and display a prediction for that user.
Here is my typescript code:
import { Component, OnInit, ViewChild, ElementRef, Renderer2 } from '#angular/core';
import { Plugins, CameraResultType, CameraSource} from '#capacitor/core';
import { DomSanitizer, SafeResourceUrl} from '#angular/platform-browser';
import { Platform } from '#ionic/angular';
import * as tf from '#tensorflow/tfjs';
import { rendererTypeName } from '#angular/compiler';
import { Base64 } from '#ionic-native/base64/ngx';
import { defineCustomElements } from '#ionic/pwa-elements/loader';
const { Camera } = Plugins;
#Component({
  selector: 'app-predict',
  templateUrl: './predict.page.html',
  styleUrls: ['./predict.page.scss'],
})
export class PredictPage{
  linearModel : tf.Sequential;
  prediction : any;
  InputTaken : any;
  ctx: CanvasRenderingContext2D;
  pos = { x: 0, y: 0 };
  canvasElement : any;
  photo: SafeResourceUrl;
  model: tf.LayersModel;
  constructor(public el : ElementRef , public renderer : Renderer2 , public platform : Platform, private base64: Base64,
    private sanitizer: DomSanitizer) 
  {}
  
  async takePicture() {
    const image = await Camera.getPhoto({
        quality: 90,
        allowEditing: true,
        resultType: CameraResultType.DataUrl,
        source: CameraSource.Camera});
      const model = await tf.loadLayersModel('src/app/assets/model.json');
      this.photo = this.sanitizer.bypassSecurityTrustResourceUrl(image.base64String);
      defineCustomElements(window);
    const pred = await tf.tidy(() => {
          // Make and format the predications
        const output = this.model.predict((this.photo)) as any;
                                
          // Save predictions on the component
        this.prediction = Array.from(output.dataSync()); 
        });
  }
}
In this code, I have imported the necessary tools. Then, I have my constructor function and a takepicture() function. In the takepicture function, I have included functionality for the user to take pictures. However, I am having trouble with passing the pictures taken to the tensorflowjs model to get a prediction. I am passing the picture taken to the tensorflowjs model in this line of code:
const output = this.model.predict((this.photo)) as any;
However, I am getting an error stating that:
Argument of type 'SafeResourceUrl' is not assignable to parameter of type 'Tensor | Tensor[]'.\n Type 'SafeResourceUrl' is missing the following properties from type 'Tensor[]': length, pop, push, concat, and 26 more.
It would be appreciated if I could receive some guidance regarding this topic.
The model is expecting you to pass in a Tensor input, but you're passing it some other image format that isn't in the tfjs ecosystem. You should first convert this.photo to a Tensor, or perhaps easier, convert image.base64String to tensor.
Since you seem to be using node, try this code
// Move the base64 image into a buffer
const b = Buffer.from(image.base64String, 'base64')
// get the tensor
const image = tf.node.decodeImage(b)
// Compute the output
const output = this.model.predict(image) as any;
Other conversion solutions here: convert base64 image to tensor

Use TensorFlow model with Swift for iOS

We are trying to use TensorFlow Face Mesh model within our iOS app. Model details: https://drive.google.com/file/d/1VFC_wIpw4O7xBOiTgUldl79d9LA-LsnA/view.
I followed TS official tutorial for setting up the model: https://firebase.google.com/docs/ml-kit/ios/use-custom-models and also printed the model Input-Output using the Python script in the tutorial and got this:
INPUT
[ 1 192 192 3]
<class 'numpy.float32'>
OUTPUT
[ 1 1 1 1404]
<class 'numpy.float32'>
At this point, I'm pretty lost trying to understand what those numbers mean, and how do I pass the input image and get the output face mesh points using the model Interpreter. Here's my Swift code so far:
let coreMLDelegate = CoreMLDelegate()
var interpreter: Interpreter
// Core ML delegate will only be created for devices with Neural Engine
if coreMLDelegate != nil {
interpreter = try Interpreter(modelPath: modelPath,
delegates: [coreMLDelegate!])
} else {
interpreter = try Interpreter(modelPath: modelPath)
}
Any help will be highly appreciated!
What those numbers mean completely depends on the model you're using. It's unrelated to both TensorFlow and Core ML.
The output is a 1x1x1x1404 tensor, which basically means you get a list of 1404 numbers. How to interpret those numbers depends on what the model was designed to do.
If you didn't design the model yourself, you'll have to find documentation for it.

LSTM dimension issues with swift/coreml implementation

I have generated a LSTM model for audio classification using keras with tf as the backend. Upon conversion to a .mlmodel using coremltools I am running into issues as you can see here. The dimensions are very different from what is expected.
I used this for my base in xcode in swift.
Particularly this snip is what I believe is giving me the trouble:
do {
let request = try SNClassifySoundRequest(mlModel: soundClassifier.model)
try analyzer.add(request, withObserver: resultsObserver)
} catch {
print("Unable to prepare request: \(error.localizedDescription)")
return
}
}
Running this model gives me the following error:
Invalid model, inputDescriptions.count = 5
Unable to prepare request: Invalid model, inputDescriptions.count = 5
Even though when I build the model I see what is expected in the spec:
description {
input {
name: "audioSamples"
shortDescription: "Audio from microphone"
type {
multiArrayType {
shape: 13
dataType: DOUBLE
}
}
}
I am trying to incorporate this post into my code but I am not sure how to format it to my needs. Any advice is greatly appreciated. I can see that MLMultiArray is the key to my question, but I am unsure of: how to put the proper data into it and how to push this into a SNClassifySoundRequest type.
keras == 2.3.1
coremltools == 3.3
When you use SNClassifySoundRequest, your model needs to have a certain structure. I don't know the exact details off the top of my head, but I think it needs to be a pipeline where the first model is a built-in model that converts the audio to spectrograms.
If you trained your model with Keras, it's most likely not compatible with the requirements of SNClassifySoundRequest.
The good news is that you don't need SNClassifySoundRequest to run your model. Simply call soundClassifier.prediction(...) on the model.
Note that you need to pass in the input but also the hidden states of the LSTM layers. Core ML will not automatically manage the LSTM state for you (unlike Keras).

Obsolete/erroneous code in convolutional_mlp.py at DeepLearningTutorials?

This code contains the following tidbit:
from theano.tensor.nnet import conv2d
...
# convolve input feature maps with filters
conv_out = conv2d(
input=input,
filters=self.W,
filter_shape=filter_shape,
input_shape=image_shape
)
which raises an exception due to 'input_shape' not being found, despite being mentioned in the documentation where it says that:
"image_shape ... – Deprecated alias for input_shape"
Looking at conv.py both locally and in the source I found:
def conv2d(input, filters, image_shape=None, filter_shape=None,
border_mode='valid', subsample=(1, 1), **kargs):
Needless to say, there is no trace of input_shape.
If one modifies the code above as follows
# convolve input feature maps with filters
conv_out = conv2d(
input=input,
filters=self.W,
filter_shape=filter_shape,
image_shape=image_shape
)
, the exception disappears and the code runs fine.
What am I missing? If image_shape is deprecated, how come it works while input_shape does not?
Is the theano version at the repository obsolete?
PS: I would have liked to ask directly the folks at http://deeplearning.net, but I could not find how.
Are you sure you have the latest version installed?
conv.py contains the deprecated implementation of conv2d. The new implementation can be found in __init__.py
Make sure you are using the import statement
from theano.tensor.nnet import conv2d
and not
from theano.tensor.nnet.conv import conv2d
since the second one is going to import the deprecated implementation

How to connect ITK with VTK?

I'm reading DICOM series using itk and converting them into VTK for visualization purposes. Even though I manage to visualize DICOM series in 3 different windows with 3 different orientations (XY, XZ and YZ) I can't even click on the windows. When I click or try to change the slice viewed, my code gives an access violation error. I'm using ImageViewer2 to visualize the slices. A file called itkVTKImageExportBase.cxx is opened when I try to find out what the error is. The lines referred to are:
void VTKImageExportBase::UpdateInformationCallbackFunction(void* userData)
{
static_cast<VTKImageExportBase*>
(userData)->UpdateInformationCallback();
}
My code is as follows:
typedef itk::VTKImageExport< ImageType > ExportFilterType;
ExportFilterType::Pointer itkExporter = ExportFilterType::New();
itkExporter->SetInput( reader->GetOutput() );
// Create the vtkImageImport and connect it to the itk::VTKImageExport instance.
vtkImageImport* vtkImporter = vtkImageImport::New();
ConnectPipelines(itkExporter, vtkImporter);
pViewerXY->SetInput(vtkImporter->GetOutput());
pViewerXY->SetSlice(3);
pViewerXY->SetSliceOrientationToXY();
pViewerXY->SetupInteractor(m_pVTKWindow_1);
pViewerXY->UpdateDisplayExtent();
m_pVTKWindow_1->AddObserver(vtkCommand::KeyPressEvent, m_pVTKWindow_1_CallbackCommand);
m_pVTKWindow_1->Update();
pViewerXZ->SetInput (vtkImporter->GetOutput());
pViewerXZ->SetSliceOrientationToXZ();
pViewerXZ->SetupInteractor(m_pVTKWindow_2);
pViewerXZ->UpdateDisplayExtent();
m_pVTKWindow_2->AddObserver(vtkCommand::KeyPressEvent, m_pVTKWindow_2_CallbackCommand);
m_pVTKWindow_2->Update();
pViewerYZ->SetInput (vtkImporter->GetOutput());
pViewerYZ->SetSliceOrientationToYZ();
pViewerYZ->SetupInteractor(m_pVTKWindow_3);
pViewerYZ->UpdateDisplayExtent();
m_pVTKWindow_3->AddObserver(vtkCommand::KeyPressEvent, m_pVTKWindow_3_CallbackCommand);
m_pVTKWindow_3->Update();
pViewerXX windows are imageviewer2 objects whereas m_pVTKWindow_X refers to wxVTK objects to use in wxWidgets GUI package.
Optional: My exporter and importer are as given below:
template <typename ITK_Exporter, typename VTK_Importer>
void ConnectPipelines(ITK_Exporter exporter, VTK_Importer* importer)
{
importer->SetUpdateInformationCallback(exporter->GetUpdateInformationCallback());
importer->SetPipelineModifiedCallback(exporter->GetPipelineModifiedCallback());
importer->SetWholeExtentCallback(exporter->GetWholeExtentCallback());
importer->SetSpacingCallback(exporter->GetSpacingCallback());
importer->SetOriginCallback(exporter->GetOriginCallback());
importer->SetScalarTypeCallback(exporter->GetScalarTypeCallback());
importer->SetNumberOfComponentsCallback(exporter->GetNumberOfComponentsCallback());
importer->SetPropagateUpdateExtentCallback(exporter->GetPropagateUpdateExtentCallback());
importer->SetUpdateDataCallback(exporter->GetUpdateDataCallback());
importer->SetDataExtentCallback(exporter->GetDataExtentCallback());
importer->SetBufferPointerCallback(exporter->GetBufferPointerCallback());
importer->SetCallbackUserData(exporter->GetCallbackUserData());
}
/**
* This function will connect the given vtkImageExport filter to
* the given itk::VTKImageImport filter.
*/
template <typename VTK_Exporter, typename ITK_Importer>
void ConnectPipelines(VTK_Exporter* exporter, ITK_Importer importer)
{
importer->SetUpdateInformationCallback(exporter->GetUpdateInformationCallback());
importer->SetPipelineModifiedCallback(exporter->GetPipelineModifiedCallback());
importer->SetWholeExtentCallback(exporter->GetWholeExtentCallback());
importer->SetSpacingCallback(exporter->GetSpacingCallback());
importer->SetOriginCallback(exporter->GetOriginCallback());
importer->SetScalarTypeCallback(exporter->GetScalarTypeCallback());
importer->SetNumberOfComponentsCallback(exporter->GetNumberOfComponentsCallback());
importer->SetPropagateUpdateExtentCallback(exporter->GetPropagateUpdateExtentCallback());
importer->SetUpdateDataCallback(exporter->GetUpdateDataCallback());
importer->SetDataExtentCallback(exporter->GetDataExtentCallback());
importer->SetBufferPointerCallback(exporter->GetBufferPointerCallback());
importer->SetCallbackUserData(exporter->GetCallbackUserData());
}
I don't have an exact answer to your question, but have you considered using one of the common medical image processing frameworks? There are a couple of them like MITK (mitk.org) or Slicer3D (slicer.org). They do a great job of linking together ITK, VTK and a sophisticated GUI Framework like QT (in the MITK case).
I have worked a long time in medical image processing and used MITK extensively. In my opinion, using a medical image processing framework really helps you focus on your real image processing problems, rather than trying to build processing/visualization pipelines for different types of visualizations.
If you look at InsightApplications, there are two methods:
The same as you tried, which is here, or
This one, which creates a pipeline object that can be connected on both sides. We're actually using this, and it works out quite well for us. You can copy this class into your code and use it.
There are some interesting usage examples in there too. Take a look at them and see if you can modify anything for your requirement.
Classes in the ITKVTkGlue module can be used to convert an ITK image to in a pipeline. See the tests for examples of how the classes are applied.