How to use FaceNet and DBSCAN on multiple embeddings identities? - cluster-analysis

I have the following setting:
A surveillance system take photos of people's faces (there are a varying number of photos for each person).
I run FaceNet for each photo and get a list of embedding vectors for each person (each person is represented by a list of embeddings, not by a single one).
The problem:
I want to cluster observed people using DBSCAN, but I need to guarantee that face embeddings from the same people go to the same cluster (remember we can have multiple photos of the same people, and we already know they must belong to the same cluster).
One solution could be to get a "mean" or average embedding for each person, but I believe this data loss is going to produce bad results.
Another solution could be to concatenate N embeddings (with N constant) in a single vector and pass that 512xN vector to DBSCAN, but the problem with this is that the order in which the embeddings are appended to this vector is going to produce different results.
Anyone has faced this same problem?

deepface wraps facenet face recognition model. The regular face recognition process is shown below.
#!pip install deepface
from deepface import DeepFace
my_set = [
["img1.jpg", "img2.jpg"],
["img1.jpg", "img3.jpg"],
]
obj = DeepFace.verify(my_set, model_name = 'Facenet')
for i in obj:
print(i["distance"])
If you need the embeddings generated by facenet, you can adopt deepface as well.
from deepface.commons import functions
from deepface.basemodels import Facenet
model = Facenet.loadModel()
#this detects and aligns faces. Facenet expects 160x160x3 shaped inputs.
img1 = functions.preprocess_face("img1.jpg", target_size = (160, 160))
img2 = functions.preprocess_face("img2.jpg", target_size = (160, 160))
#this finds embeddings for images
img1_embedding = model.predict(img1)
img2_embedding = model.predict(img2)
Embeddings will be 128 dimensional vectors for Facenet. You can run any clustering algorithm to embeddings. I have applied k-means for this kind of a study. I haven't any experience about dbscan but you can apply it if you have the embeddings.
Besides, you can adopt different face recognition models within deepface such as vgg-face, openface, facebook deepface, deepid and dlib.

Related

I want to make cluster of sentences but now i don't know that how many cluster will be made

I have calculated the embedding with the help of doc2vec and I have also calculated the distance between sentences in vector form. now I have a vector of sentences that tells the distance between them(sentences). how can I cluster them without giving the number of clusters? I have used k-means and agglomerative algo but they are not giving me good results. can anybody tell me the best method to determine the optimal number of clusters?
Try this. If it doesn't do what you want, I have a few other code samples to share. This may be the best option. The best option to use, can change, based on the dataset that you feed into the algo.
import numpy as np
from sklearn.cluster import AffinityPropagation
import distance
words = "kitten belly squooshy merley best eating google feedback face extension impressed map feedback google eating face extension climbing key".split(" ") #Replace this line
words = np.asarray(words) #So that indexing with a list will work
lev_similarity = -1*np.array([[distance.levenshtein(w1,w2) for w1 in words] for w2 in words])
affprop = AffinityPropagation(affinity="precomputed", damping=0.5)
affprop.fit(lev_similarity)
for cluster_id in np.unique(affprop.labels_):
exemplar = words[affprop.cluster_centers_indices_[cluster_id]]
cluster = np.unique(words[np.nonzero(affprop.labels_==cluster_id)])
cluster_str = ", ".join(cluster)
print(" - *%s:* %s" % (exemplar, cluster_str))
Result:

AffinityPropagation with Sentence Transformar not converging

I'm trying to cluster a text dataset using Sentence Transformer and Affinity Propagation but I keep getting "ConvergenceWarning: Affinity propagation did not converge, this model will not have any cluster centers." and, therefore, no clustering. I'm trying to figure it out for some time but couldn't make it work and there is few documentation online for text clustering using this algorithm.
from sklearn.cluster import AffinityPropagation
import numpy as np
model = SentenceTransformer('sentence-transformers/paraphrase-xlm-r-multilingual-v1')
sentence_embeddings = model.encode(texts)
affprop = AffinityPropagation()
af = affprop.fit(sentence_embeddings)
cluster_centers_indices = af.cluster_centers_indices_
len(cluster_centers_indices) # this line returns zero
Did someone get this same problem and have some workaround to suggest?

Interclass and Intraclass classification structure of CNN

I am working on a inter-class and intra-class classification problem with one CNN such as first there is two classes Cat and Dog than in Cat there is a classification three different breeds of cats and in Dog there are 5 different breeds dogs.
I haven't tried the coding yet just working on feasibility if that works.
My question is what will be the feasible design for this kind of problem.
I am thinking to design for the training, first CNN-1 network that will differentiate cat and dog and gather the image data of all the training images. After the separation of cat and dog, CNN-2 and CNN-3 will train these images further for each breed of dog and cat. I am just not sure how the testing will work in this situation.
I have approached a similar problem previously in Python. Hopefully this is helpful and you can come up with an alternative implementation in Matlab if that is what you are using.
After all was said and done, I landed on a single model for all predictions. For your purpose you could have one binary output for dog vs. cat, another multi-class output for the dog breeds, and another multi-class output for the cat breeds.
Using Tensorflow, I created a mask for the irrelevant classes. For example, if the image was of a cat, then all of the dog breeds are irrelevant and they should not impact model training for that example. This required a customized TF Dataset (that converted 0's to -1 for the mask) and a customized loss function that returned 0 error when the mask was present for that example.
Finally for the training process. Specific to your question, you will have to create custom accuracy functions that can handle the mask values how you want them to, but otherwise this part of the process should be standard. It was best practice to evenly spread out the classes among the training data but they can all be trained together.
If you google "Multi-Task Training" you can find additional resources for this problem.
Here are some code snips if you are interested:
For the customize TF dataset that masked irrelevant labels...
# Replace 0's with -1 for mask when there aren't any labels
def produce_mask(features):
for filt, tensor in features.items():
if "target" in filt:
condition = tf.equal(tf.math.reduce_sum(tensor), 0)
features[filt] = tf.where(condition, tf.ones_like(tensor) * -1, tensor)
return features
def create_dataset(filepath, batch_size=10):
...
# **** This is where the mask was applied to the dataset
dataset = dataset.map(produce_mask, num_parallel_calls=cpu_count())
...
return parsed_features
Custom loss function. I was using binary-crossentropy because my problem was multi-label. You will likely want to adapt this to categorical-crossentropy.
# Custom loss function
def masked_binary_crossentropy(y_true, y_pred):
mask = backend.cast(backend.not_equal(y_true, -1), backend.floatx())
return backend.binary_crossentropy(y_true * mask, y_pred * mask)
Then for the custom accuracy metrics. I was using top-k accuracy, you may need to modify for your purposes, but this will give you the general idea. When comparing this to the loss function, instead of converting all to 0, which would over-inflate the accuracy, this function filters those values out entirely. That works because the outputs are measured individually, so each output (binary, cat breed, dog breed) would have a different accuracy measure filtered only to the relevant examples.
backend is keras backend.
def top_5_acc(y_true, y_pred, k=5):
mask = backend.cast(backend.not_equal(y_true, -1), tf.bool)
mask = tf.math.reduce_any(mask, axis=1)
masked_true = tf.boolean_mask(y_true, mask)
masked_pred = tf.boolean_mask(y_pred, mask)
return top_k_categorical_accuracy(masked_true, masked_pred, k)
Edit
No, in the scenario I described above there is only one model and it is trained with all of the data together. There are 3 outputs to the single model. The mask is a major part of this as it allows the network to only adjust weights that are relevant to the example. If the image was a cat, then the dog breed prediction does not result in loss.

Do I have to preprocess test data using neural networks?

I am using Keras (version 2.0.0) and I'd like to make use of pretrained models like e.g. VGG16.
In order to get started, I ran the example of the [Keras documentation site ][https://keras.io/applications/] for extracting features with VGG16:
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
import numpy as np
model = VGG16(weights='imagenet', include_top=False)
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
features = model.predict(x)
The used preprocess_input() function bothers me
(the function does Zero-centering by mean pixel what can be seen by looking at the source code).
Do I really have to preprocess input data (validation/test data) before using a trained model?
a)
If yes, one can conclude that you always have to be aware of what preprocessing steps have been performed during training phase?!
b)
If no: Does preprocessing of validation/test data cause a bias?
I appreciate your help.
Yes you should use the preprocessing step. You can retrain the model without it but the first layers will learn to center your datas so this is a waste of parameters.
If you do not recenter your performances will suffer.
Great thread on reddit : https://www.reddit.com/r/MachineLearning/comments/3q7pjc/why_is_removing_the_mean_pixel_value_from_each/

Scikit-Learn's DPGMM fitting: number of components?

I'm trying to fit a mixed normal model to some data using scikit-learn's DPGMM algorithm. One of the advantages advertised on [0] is that I don't need to specify the number of components; which is good, because I do not know the number of components in my data. The documentation states that I only need to specify an upper bound. However, it looks very much like that is not true:
>>> data = numpy.random.normal(loc = 0.0, scale = 1.0, size = 1000)
>>> from sklearn.mixture import DPGMM
>>> d = DPGMM(n_components=5)
>>> d.fit(data.reshape(-1,1))
DPGMM(alpha=1.0, covariance_type='diag', init_params='wmc', min_covar=None,
n_components=5, n_iter=10, params='wmc', random_state=None, thresh=None,
tol=0.001, verbose=0)
>>> d.n_components
5
>>> d.means_
array([[-0.02283383],
[ 0.06259168],
[ 0.00390097],
[ 0.02934676],
[-0.05533165]])
As you can see, the fitting reports five components (the upper bound) even for data clearly sampled from just one normal distribution.
Am I doing something wrong? Did I misunderstand something?
Thanks a lot in advance,
Lukas
[0] http://scikit-learn.org/stable/modules/mixture.html#dpgmm
I recently had similar doubts about results of this DPGMM implementation. If you check provided example you notice that DPGMM always return model with n_components, now the trick is to remove redundant components. This can be done with predict function.
Unfortunately this important pice is hidden in comment in code example.
# as the DP will not use every component it has access to
# unless it needs it, we shouldn't plot the redundant components
Perhaps look at using an improved sklearn solution for this kind of problem, namely a Bayesian Gaussian Mixture. With this model, the suggested prior number of components must be given, but once trained, the model assigns weightings to each component, which essentially indicate their relevance. Here is a pretty cool visual demo of BGMM in action.
Once you have experimented with training a few BGMMs on your data, you can get a feel for a sensible estimate to the number of components for your given problem.