How to select masks with feature among only black ones- semantic segmentation - semantic-segmentation

I'm working on a very unbalanced dataset for semantic segmentation. The majority case of image's masks have only background (0 pixels) and no feature (1 pixels) leading to a very biased and unefficient Unet networks.
I'm looking for a code to select (from folders) only masks, and corresponding images with at least one feature to segment. Any possible idea?

I think there are a lot of ways to achieve this, but the first one that comes to my mind is to check if there are non-zeros values in your mask.
You forgot to tell us what framework do you use, so let's assume it's python, but you could try something like :
import os
import numpy as np
import cv2
numpy_images_array = list()
numpy_masks_array = list()
for mask_path, img_path in zip(os.listdir(MASKS_DIR), os.listdir(IMG_DIR)):
mask = cv2.imread(mask_path)
if np.any(mask!=0) : # or: if len(np.unique(mask))>1
numpy_masks_array.append(mask)
numpy_images_array.append(cv2.imread(img_path))

Related

Different results with torchvision transforms

Correct me if I am wrong.
The 'classic' way to pass images through torchvision transforms is to
use Compose as in its doc page. This, however, requires to pass Image input.
An alternative is to use ConvertImageDtype with torch.nn.Sequential. This 'bypasses'
the need for Image, and in my case it is much faster because I work with numpy arrays.
My problem is that results are not identical.
Below is an example with custom Normalize.
I would like to use torch.nn.Sequential (tr) because it is faster for my needs,
but the error compared to Compose (tr2) is very large (~810).
from PIL import Image
import torchvision.transforms as T
import numpy as np
import torch
o = np.random.rand(64, 64, 3) * 255
o = np.array(o, dtype=np.uint8)
i = Image.fromarray(o)
tr = torch.nn.Sequential(
T.Resize(224, interpolation=T.InterpolationMode.BICUBIC),
T.CenterCrop(224),
T.ConvertImageDtype(torch.float),
T.Normalize([0.48145466, 0.4578275, 0.40821073], [0.26862954, 0.26130258, 0.27577711]),
)
tr2 = T.Compose([
T.Resize(224, interpolation=T.InterpolationMode.BICUBIC),
T.CenterCrop(224),
T.ToTensor(),
T.Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
])
out = tr(torch.from_numpy(o).permute(2,0,1).contiguous())
out2 = tr2(i)
print(((out - out2) ** 2).sum())
The interpolation method seems to matter A LOT, and if I use the default BILINEAR the error is ~7, but I need to use BICUBIC.
The problem seems to lie in ConvertImageDtype vs ToTensor, because if I replace
ToTensor with ConvertImageDtype results are identical (cannot do the other way around
because ToTensor is not a subclass of Module and I cannot use it with nn.Sequential).
However, the following gives identical results
tr = torch.nn.Sequential(
T.ConvertImageDtype(torch.float),
)
tr2 = T.Compose([
T.ToTensor(),
])
out = tr(torch.from_numpy(o).permute(2,0,1).contiguous())
out2 = tr2(i)
print(((out - out2) ** 2).sum())
This means that the interpolation changes something in the results, which matters only
when I use ToTensor vs ConvertImageDtype.
Any input is appreciated.
This is documented here:
The output image might be different depending on its type: when downsampling, the interpolation of PIL images and tensors is slightly different, because PIL applies antialiasing. This may lead to significant differences in the performance of a network. Therefore, it is preferable to train and serve a model with the same input types. See also below the antialias parameter, which can help making the output of PIL images and tensors closer.
Passing antialias=True produces almost identical results.
This is interesting because the doc says that
it can be set to True for InterpolationMode.BILINEAR only mode.
Yet, I am using BICUBIC and still works.

How to use FaceNet and DBSCAN on multiple embeddings identities?

I have the following setting:
A surveillance system take photos of people's faces (there are a varying number of photos for each person).
I run FaceNet for each photo and get a list of embedding vectors for each person (each person is represented by a list of embeddings, not by a single one).
The problem:
I want to cluster observed people using DBSCAN, but I need to guarantee that face embeddings from the same people go to the same cluster (remember we can have multiple photos of the same people, and we already know they must belong to the same cluster).
One solution could be to get a "mean" or average embedding for each person, but I believe this data loss is going to produce bad results.
Another solution could be to concatenate N embeddings (with N constant) in a single vector and pass that 512xN vector to DBSCAN, but the problem with this is that the order in which the embeddings are appended to this vector is going to produce different results.
Anyone has faced this same problem?
deepface wraps facenet face recognition model. The regular face recognition process is shown below.
#!pip install deepface
from deepface import DeepFace
my_set = [
["img1.jpg", "img2.jpg"],
["img1.jpg", "img3.jpg"],
]
obj = DeepFace.verify(my_set, model_name = 'Facenet')
for i in obj:
print(i["distance"])
If you need the embeddings generated by facenet, you can adopt deepface as well.
from deepface.commons import functions
from deepface.basemodels import Facenet
model = Facenet.loadModel()
#this detects and aligns faces. Facenet expects 160x160x3 shaped inputs.
img1 = functions.preprocess_face("img1.jpg", target_size = (160, 160))
img2 = functions.preprocess_face("img2.jpg", target_size = (160, 160))
#this finds embeddings for images
img1_embedding = model.predict(img1)
img2_embedding = model.predict(img2)
Embeddings will be 128 dimensional vectors for Facenet. You can run any clustering algorithm to embeddings. I have applied k-means for this kind of a study. I haven't any experience about dbscan but you can apply it if you have the embeddings.
Besides, you can adopt different face recognition models within deepface such as vgg-face, openface, facebook deepface, deepid and dlib.

Do I have to preprocess test data using neural networks?

I am using Keras (version 2.0.0) and I'd like to make use of pretrained models like e.g. VGG16.
In order to get started, I ran the example of the [Keras documentation site ][https://keras.io/applications/] for extracting features with VGG16:
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
import numpy as np
model = VGG16(weights='imagenet', include_top=False)
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
features = model.predict(x)
The used preprocess_input() function bothers me
(the function does Zero-centering by mean pixel what can be seen by looking at the source code).
Do I really have to preprocess input data (validation/test data) before using a trained model?
a)
If yes, one can conclude that you always have to be aware of what preprocessing steps have been performed during training phase?!
b)
If no: Does preprocessing of validation/test data cause a bias?
I appreciate your help.
Yes you should use the preprocessing step. You can retrain the model without it but the first layers will learn to center your datas so this is a waste of parameters.
If you do not recenter your performances will suffer.
Great thread on reddit : https://www.reddit.com/r/MachineLearning/comments/3q7pjc/why_is_removing_the_mean_pixel_value_from_each/

Scikit-Learn's DPGMM fitting: number of components?

I'm trying to fit a mixed normal model to some data using scikit-learn's DPGMM algorithm. One of the advantages advertised on [0] is that I don't need to specify the number of components; which is good, because I do not know the number of components in my data. The documentation states that I only need to specify an upper bound. However, it looks very much like that is not true:
>>> data = numpy.random.normal(loc = 0.0, scale = 1.0, size = 1000)
>>> from sklearn.mixture import DPGMM
>>> d = DPGMM(n_components=5)
>>> d.fit(data.reshape(-1,1))
DPGMM(alpha=1.0, covariance_type='diag', init_params='wmc', min_covar=None,
n_components=5, n_iter=10, params='wmc', random_state=None, thresh=None,
tol=0.001, verbose=0)
>>> d.n_components
5
>>> d.means_
array([[-0.02283383],
[ 0.06259168],
[ 0.00390097],
[ 0.02934676],
[-0.05533165]])
As you can see, the fitting reports five components (the upper bound) even for data clearly sampled from just one normal distribution.
Am I doing something wrong? Did I misunderstand something?
Thanks a lot in advance,
Lukas
[0] http://scikit-learn.org/stable/modules/mixture.html#dpgmm
I recently had similar doubts about results of this DPGMM implementation. If you check provided example you notice that DPGMM always return model with n_components, now the trick is to remove redundant components. This can be done with predict function.
Unfortunately this important pice is hidden in comment in code example.
# as the DP will not use every component it has access to
# unless it needs it, we shouldn't plot the redundant components
Perhaps look at using an improved sklearn solution for this kind of problem, namely a Bayesian Gaussian Mixture. With this model, the suggested prior number of components must be given, but once trained, the model assigns weightings to each component, which essentially indicate their relevance. Here is a pretty cool visual demo of BGMM in action.
Once you have experimented with training a few BGMMs on your data, you can get a feel for a sensible estimate to the number of components for your given problem.

Find the connected component in matlab

Given a BW image that contains some connected components.
Then, given a single pixel P in the image. How to find which component that contains the pixel P? It is guaranteed that the pixel P is always on the white area in one of the connected components.
Currently, I use CC = bwconncomp(BW) than I iterate each component using 'for' loop. In the each component, I iterate the index pixel. For each pixels, I check whether the value equal to the (index of) pixel P or not. If I find it, I record the number of connected component.
However, it seems it is not efficient for this simple task. Any suggestion for improvement? Thank you very much in advance.
MATLAB provides multiple functions that implement connected-component in different ways.
In your example, I would suggest bwlabel.
http://www.mathworks.com/help/images/ref/bwlabel.html
[L, num] = bwlabel(imgBW) This will perform a full-image connected-component labeling on a black-and-white image.
After calling this function, the label value that pixel P belongs to can be read off the result matrix L, as in label_to_find = L(row, col) index. Simple as that.
To extract a mask image for that label, use logical(L == label_to_find).
If you use different software packages such as OpenCV you will be able to get better performance (efficiency in terms of cutting unnecessary or redundant computation), but in MATLAB the emphasis is on convenience and prototyping speed.