MLModel style transfer prediction. Scale effect(brushstrokes) - swift

I have style transfer model which is trained by pytorch and converted by onnx to mlmodel. The style-image was 1500x2000. By using coremltools I set two sizes: 256x256 and 1500x2000.
Now I can pass two image sizes to prediction process. Here are results:
On the left side it is 1500x2000 image, and on the right side is 256x256 (scaled up after processing)
Is it possible to pass big image but have bigger size of brushstrokes as you can see on image on the right? So I want to keep image size and quality (1500x2000) but change the size of style(brushstrokes). Or it is not possible and it is totally depend of image-style size I was using to train model.

Related

CoreML output image size is not the same as the model prediction output image size

I tried to change the existing MLModel (https://drive.google.com/file/d/16JEWh48fgQc8az7avROePOd-PYda0Yi2/view?usp=sharing) output size from 2048x2048 (existing output size) to 1024x1024.
I used this script to change the output image size:
spec = ct.utils.load_spec("myModel.mlmodel")
output = spec.description.output[0]
output.type.imageType.height = 1024
output.type.imageType.width = 1024
ct.utils.save_spec(spec, "myModelNew.mlmodel")
The new model is saved correctly, with the expected output size in the prediction tab:
But when I run it with the new model it generates 2048x2048 like the original model.
Any idea why will it behave like this? Thank you for the help!
Core ML does not automatically resize the output based on the dimensions you provide. Those dimensions just tell the user of the model what to expect. For this particular model, the output size really is 2048x2048, not 1024x1024.
The output size of the model is determined by its architecture. If you want a 1024x1024 output, you may need to remove certain layers from the architecture. Or you can add a downsampling layer to the end of the model that converts 2048x2048 into 1024x1024.
What you did is change the model description, but what you need to do is change the model itself.

If the size of input image is different from the size of images used for training, does that impact the end segmentation/accuracy?

I am doing a project for uni where i am detecting an object with U-net and then calculating the width of the object. I trained my U-net on images of size 300x300. Now i got to a point where i want to improve the accuracy of the width measurement, and for that reason i want to input images of larger size(600x600 lets say) into the model. Does this difference in size(training on 300x300, and using on 600x600) impact the overall segmentation quality?
I'm guessing it does but am not sure.

How to train a FCN network while the size of images are not fixed and they are varying?

I have already trained the FCN model with fixed size images 256x256. Could I ask from experts how can I train the same model once the size of image are changing from one image to another image?
I really appreciate your advice.
Thanks
You can choose one of these strategies:
1. Batch = 1 image
By training each image as a different batch, you can reshape the net in the forward() (rather than in reshape()) of the data layer, thus changing the net at each iteration.
+write reshape once in forward method and you no longer need to worry about input shapes and sizes.
-reshapeing the net often requires allocation/deallocation of CPU/GPU memory and therefore it takes time.
-You might find a single image in a batch to be too small of a batch.
For example (assuming you are using a "Python" layer for input):
def reshape(self, bottom, top):
pass # you do not reshape here.
def forward(self, bottom, top):
top[0].data.reshape( ... ) # reshape the blob - this will propagate the reshape to the rest of the net at each iteration
top[1].data.reshape( ... ) #
# feed the data to the net
top[0].data[...] = current_img
top[1].data[...] = current_label
2. Random crops
You can decide on a fixed input size and then randomly crop all input images (and the corresponding ground truths).
+No need to reshape every iteration (faster).
+Control over model size during train.
-Need to implement random crops for images and labels
3. Fixed size
Resize all images to the same size (like in SSD).
+Simple
-Images are distorted if not all images have the same aspect ratio.
-You are no invariant to scale

Caffe: variable input-image size for vgg network

I am trying to use caffe to extract features of the convolution layer rather than FC layer from VGG network.The theoretical input image size may be arbitrary in this situation. But it seams that VGG network was trained on images cropped to the size 224x224 pixel. So I define an input data layer in the deploy.prototext:
layers{
name: "data"
type: MEMORY_DATA
top: "data"
top: "label"
transform_param{
mirror: false
crop_size:224
mean_value:129.1863
mean_value:104.7624
mean_value:93.5940
}
memory_data_param{
batch_size:1
channels:3
width:224
height:224
}
}
I tried to modify the width = 500\height = 500\crop_size = 500 but failed. Caffe throws some errors:“ Cannot copy param 0 weights from layer 'fc6'; shape mismatch. Source param shape is 1 1 4096 25088 (102760448); target param shape is 4096 131072 (536870912). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.”
How is it possible that I can run on images which are too big for the input layer without cropping?
you should resize your image to 224x224 first, since VGG is trained on that resolution. It makes no sense to extract feature on higher resolution.
For resizing and cropping, you can use my specialized ImageData layer: https://github.com/yihui-he/caffe-pro
Either you use exactly the same size for the image, either you retrain the dense layers for your new image size.
You can reuse the convolutional kernels, but you cannot reuse the dense layer for a different image size.

Fully Convolution Networks with Varied inputs

I have a fully convolutional neural network, U-Net, which can be read below.
https://arxiv.org/pdf/1505.04597.pdf
I want to use it to do pixelwise classification of images. I have my training images available in two sizes: 512x512 and 768x768. I am using reflection padding of size (256,256,256,256) in the former in the initial step, and (384,384,384,384) in the latter. I do successive padding before convolutions, to get output of the size of input.
But since my padding is dependant on the image/input's size, I can't build a generalised model (I am using Torch).
How is the padding done in such cases?
I am new to deep learning, any help would be great. Thanks.
Your model will only accept images of the size of the first layer. You have to pre-process all of them before forwarding them to the network. In order to do so, you can use:
image.scale(img, width, height, 'bilinear')
img will be the image to scale, width and heightthe size of the first layer of your model (if I'm not mistaken it is 572*572), 'bilinear' is the algorithm it is going to use to scale the image.
Keep in mind that it might be necessary to extract the mean of the image or to change it to BGR (depending on how the model was trained).
The first thing to do is to process all of your images to be the same size. The CONV layer input requires all images to be of the specified dimensions.
Caffe allows you a reshape within the prototxt file; in Torch, I think there's a comparable command you can drop at the front of createModel, but I don't recall the command name. If not, then you'll need to do it outside the model flow.