Is there a way to train Instance Segmentation only with Segmentation not bbox - image-segmentation

Is there a way to train Instance Segmentation only with Segmentation not bbox
Becuase I can't get the bbox easily due to the dataset has many augmented images
Thank you

Related

CNN as a backbone of U-net using segmentation models library

I’ve a confusion regarding using CNN (VGG, ResNet) as a backbone of U-net. I’m using segmentation model library for using VGG and ResNet as a backbone. My input shape is 512x512x3. As far I’ve understood in U-net the skip connection is used in before every layer where a downsampling happes (example: maxpool for VGG or conv with 2x2 stride for ResNet).But, in the model summary for both VGG and ResNet based backbone I’m seeing the skip connection is there from the second downsampling (256x256x64) but there is no skip connection from the 512 resolution. Can someone explain the reasons? I’ve added the detailed model diagram for reference.
I was following this code, https://github.com/bnsreenu/python_for_microscopists/blob/master/214_multiclass_Unet_sandstone_segm_models_ensemble.py.
Thanks in advance.

What is the relationship between instance segmentation and semantic segmentation from the perspective of neural networks?

I am clear with the tasks of instance segmentation and semantic segmentation. However, from the perspective of the neural networks, what is the relationship between them? Namely, is it feasible to realize instance segmentation by improving or modifying a neural network for semantic segmentation, e.g. DeepLab? If so, what operations are usually used? Many thanks.
Lets assume that you want to know where exist desired class in image then you makes nn that for each pixel predict probability where is that class - this is sematic segmentation.
And when you want to know where exist each instance for desired class this is instance segmentation.picture for example

niftynet multi-class 3D segmentation with dense vnet

Neural network newbie here. I've been testing Niftynet and achieved decent single-class 3D segmentation predictions on an own MRI data set with dense_vnet. However, I ran out of luck when I tried to add a second label. The network seems to spot the correct organs but can't get rid of additional artifacts as if it cannot get out of a local minimum or it doesn't have enough degrees of freedom or something. This is one of the better looking prediction slices which does show some correct labels but also additional noise.
Why would a single-class segmentation work better than a multi-class segmentation? Is it even reasonable to expect good multi-class 3D segmentation results out of DenseVnet? If yes, is there a specific approach to improve the results?
P.S.
Niftynet's site refers to stackoverflow for general questions.
Apparently, DenseVnet does handle multi-class segmentation okay. They have provided a ready model with a Dice loss extension. It worked with my MRI data without any pre-processing even though it's been designed for CT images and Hounsfield units.

how is the dimension of the activation being as an input to the pooling layer

I am using alexnet, you can see the structure of the network as following:
Alexnet structure with outputs
I used the activations function in Matlab to get the features of my data from the output of conv5 layer. The output is a feature vector with a dimension 43264 for each single image (I have 14000 Images).
I did some processing on this output with no change in the dimension so it still 43264.
I want to re-enter the data to the network starting in pooling layer 5 and train the network.
As you can notice in the structure of alexnet, the input of the pooling 5 should be 13x13x256. So I changed the feature vector 43264 to 13x13x256 matrix, so the whole training set will be a cell array 14000x1 each cell has 13x13x256.
I used the following code to train the network:
net = trainNetwork (Trainingset, labels, Layers, trainingOptions)
I still has an error saying unexpected input to Pooling layer!
Any I idea please?
Thanks in advance
Many Thanks

Upsampling in Semantic Segmentation

I am trying to implement a paper on Semantic Segmentation and I am confused about how to Upsample the prediction map produced by my segmentation network to match the input image size.
For example, I am using a variant of Resnet101 as the segmentation network (as used by the paper). With this network structure, an input of size 321x321 (again used in the paper) produces a final prediction map of size 41x41xC (C is the number of classes). Because I have to make pixel-level predictions, I need to upsample it to 321x321xC. Pytorch provides function to Upsample to an output size which is a multiple of the prediction map size. So, I can not directly use that method here.
Because this step is involved in every semantic segmentation network, I am sure there should be a standard way to implement this.
I would appreciate any pointers. Thanks in advance.
Maybe the simpliest thing you can try is:
upsample 8 times. Then you 41x41 input turns into 328x328
perform center cropping to get your desired shape 321x321 (for instance, something like this input[3:,3:,:-4,:-4])