Find the smallest loss in multiobject detection - neural-network

I am detecting multiple ( up to 10 ) objects on video image or anything.
input is image and labels for that image, for example
[[box1],[box2],[box3],[00...],....]
Each box is represented as xmin, ymin , xmax, and ymax.
What is causing my trouble is situation, when my output of CNN is something like this:
[[somewhat close to box3],[something],[something...]]
How to treat situation like this? If i dont treat it the input box will be compared to "somewhat close to box3" box. Which will result in big loss, thus the network will learn wrongly. However if my input b3 was compared to "somewhat close to box3" the network would learn much better.
Currently what i am doing is:
20 times ( epoch size)
1) Train network epoch size = 1
2) Test the network on trained data
3) Calculate IOU for all combinations of boxes.
4) Sort the boxes in my input label according to IOU ( so if input box 2 will have biggest IOU with output box 1 , put input box on 1st position in input label )
However, this seem not to work as excted, specially when IOU in first N epochs is 0.
How could i deal with this? Thanks for help.
edit:
My exact shape of output is
(10,4) vector e.g
[[xmin,ymin,xamx,ymax], // first bounding box of object
[xmin,ymin,xmax,ymax] // second..
.... ]
If no object is present in box , the box has all properties set to 0

Related

How to change pixel values of an RGB image in MATLAB?

So what I need to do is to apply an operation like
(x(i,j)-min(x)) / max(x(i,j)-min(x))
which basically converts each pixel value such that the values range between 0 and 1.
First of all, I realised that Matlab saves our image(rows * col * colour) in a 3D matrix on using imread,
Image = imread('image.jpg')
So, a simple max operation on image doesn't give me the max value of pixel and I'm not quite sure what it returns(another multidimensional array?). So I tried using something like
max_pixel = max(max(max(Image)))
I thought it worked fine. Similarly I used min thrice. My logic was that I was getting the min pixel value across all 3 colour planes.
After performing the above scaling operation I got an image which seemed to have only 0 or 1 values and no value in between which doesn't seem right. Has it got something to do with integer/float rounding off?
image = imread('cat.jpg')
maxI = max(max(max(image)))
minI = min(min(min(image)))
new_image = ((I-minI)./max(I-minI))
This gives output of only 1s and 0s which doesn't seem correct.
The other approach I'm trying is working on all colour planes separately as done here. But is that the correct way to do it?
I could also loop through all pixels but I'm assuming that will be time taking. Very new to this, any help will be great.
If you are not sure what a matlab functions returns or why, you should always do one of the following first:
Type help >functionName< or doc >functionName< in the command window, in your case: doc max. This will show you the essential must-know information of that specific function, such as what needs to be put in, and what will be output.
In the case of the max function, this yields the following results:
M = max(A) returns the maximum elements of an array.
If A is a vector, then max(A) returns the maximum of A.
If A is a matrix, then max(A) is a row vector containing the maximum
value of each column.
If A is a multidimensional array, then max(A) operates along the first
array dimension whose size does not equal 1, treating the elements as
vectors. The size of this dimension becomes 1 while the sizes of all
other dimensions remain the same. If A is an empty array whose first
dimension has zero length, then max(A) returns an empty array with the
same size as A
In other words, if you use max() on a matrix, it will output a vector that contains the maximum value of each column (the first non-singleton dimension). If you use max() on a matrix A of size m x n x 3, it will result in a matrix of maximum values of size 1 x n x 3. So this answers your question:
I'm not quite sure what it returns(another multidimensional array?)
Moving on:
I thought it worked fine. Similarly I used min thrice. My logic was that I was getting the min pixel value across all 3 colour planes.
This is correct. Alternatively, you can use max(A(:)) and min(A(:)), which is equivalent if you are just looking for the value.
And after performing the above operation I got an image which seemed to have only 0 or 1 values and no value in between which doesn't seem right. Has it got something to do with integer/float rounding off?
There is no way for us to know why this happens if you do not post a minimal, complete and verifiable example of your code. It could be that it is because your variables are of a certain type, or it could be because of an error in your calculations.
The other approach I'm trying is working on all colour planes separately as done here. But is that the correct way to do it?
This depends on what the intended end result is. Normalizing each colour (red, green, blue) seperately will result in a different result as compared to normalizing the values all at once (in 99% of cases, anyway).
You have a uint8 RGB image.
Just convert it to a double image by
I=imread('https://upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Cat_poster_1.jpg/1920px-Cat_poster_1.jpg')
I=double(I)./255;
alternatively
I=im2double(I); %does the scaling if needed
Read about image data types
What are you doing wrong?
If what you want todo is convert a RGB image to [0-1] range, you are approaching the problem badly, regardless of the correctness of your MATLAB code. Let me give you an example of why:
Say you have an image with 2 colors.
A dark red (20,0,0):
A medium blue (0,0,128)
Now you want this changed to [0-1]. How do you scale it? Your suggested approach is to make the value 128->1 and either 20->20/128 or 20->1 (not relevant). However when you do this, you are changing the color! you are making the medium blue to be intense blue (maximum B channel, ) and making R way way more intense (instead of 20/255, 20/128, double brightness! ). This is bad, as this is with simple colors, but with combined RGB values you may even change the color itsef, not only the intensity. Therefore, the only correct way to convert to [0-1] range is to assume your min and max are [0, 255].

Caffe classification labels in HDF5

I am finetuning a network. In a specific case I want to use it for regression, which works. In another case, I want to use it for classification.
For both cases I have an HDF5 file, with a label. With regression, this is just a 1-by-1 numpy array that contains a float. I thought I could use the same label for classification, after changing my EuclideanLoss layer to SoftmaxLoss. However, then I get a negative loss as so:
Iteration 19200, loss = -118232
Train net output #0: loss = 39.3188 (* 1 = 39.3188 loss)
Can you explain if, and so what, goes wrong? I do see that the training loss is about 40 (which is still terrible), but does the network still train? The negative loss just keeps on getting more negative.
UPDATE
After reading Shai's comment and answer, I have made the following changes:
- I made the num_output of my last fully connected layer 6, as I have 6 labels (used to be 1).
- I now create a one-hot vector and pass that as a label into my HDF5 dataset as follows
f['label'] = numpy.array([1, 0, 0, 0, 0, 0])
Trying to run my network now returns
Check failed: hdf_blobs_[i]->shape(0) == num (6 vs. 1)
After some research online, I reshaped the vector to a 1x6 vector. This lead to the following error:
Check failed: outer_num_ * inner_num_ == bottom[1]->count() (40 vs. 240)
Number of labels must match number of predictions; e.g., if softmax axis == 1
and prediction shape is (N, C, H, W), label count (number of labels)
must be N*H*W, with integer values in {0, 1, ..., C-1}.
My idea is to add 1 label per data set (image) and in my train.prototxt I create batches. Shouldn't this create the correct batch size?
Since you moved from regression to classification, you need to output not a scalar to compare with "label" but rather a probability vector of length num-labels to compare with the discrete class "label". You need to change num_output parameter of the layer before "SoftmaxWithLoss" from 1 to num-labels.
I believe currently you are accessing un-initialized memory and I would expect caffe to crash sooner or later in this case.
Update:
You made two changes: num_output 1-->6, and you also changed your input label from a scalar to vector.
The first change was the only one you needed for using "SoftmaxWithLossLayer".
Do not change label from a scalar to a "hot-vector".
Why?
Because "SoftmaxWithLoss" basically looks at the 6-vector prediction you output, interpret the ground-truth label as index and looks at -log(p[label]): the closer p[label] is to 1 (i.e., you predicted high probability for the expected class) the lower the loss. Making a prediction p[label] close to zero (i.e., you incorrectly predicted low probability for the expected class) then the loss grows fast.
Using a "hot-vector" as ground-truth input label, may give rise to multi-category classification (does not seems like the task you are trying to solve here). You may find this SO thread relevant to that particular case.

How do I interpret pycaffe classify.py output?

I created a GoogleNet Model via Nvidia DIGITS with two classes (called positive and negative).
If I classify an image with DIGITS, it shows me a nice result like positive: 85.56% and negative: 14.44%.
If it pass that model it into pycaffe's classify.py with the same image, I get a result like array([[ 0.38978559, -0.06033826]], dtype=float32)
So, how do I read/interpret this result? How do I calculate the confidence levels (not sure if this is the right term) shown by DIGITS from the results shown by classify.py?
This issue led me to the solution.
As the log shows, the network produces three outputs. Classifier#classify only returns the first output. So e.g. by changing predictions = out[self.outputs[0]] to predictions = out[self.outputs[2]], I get the desired values.

Using SURF algorithm to match objects on MATLAB

The objective is to see if two images, which have one object captured in each image, matches.
The object or image I have stored. This will be used as a baseline:
item1 (This is being matched in the code)
The object/image that needs to matched with-this is stored:
input (Need to see if this matches with what is stored
My method:
Covert images to gray-scale.
Extract SURF interest points.
Obtain features.
Match features.
Get 50 strongest features.
Match the number of strongest features with each image.
Take the ratio of- number of features matched/ number of strongest
features (which is 50).
If I have two images of the same object (two images taken separately on a camera), ideally the ratio should be near 1 or near 100%.
However this is not the case, the best ratio I am getting is near 0.5 or even worse, 0.3.
I am aware the SURF detectors and features can be used in neural networks, or using a statistics based approach. I believe I have approached the statistics based approach to some extent by using 50 of the strongest features.
Is there something I am missing? What do I add onto this or how do I improve it? Please provide me a point to start from.
%Clearing the workspace and all variables
clc;
clear;
%ITEM 1
item1 = imread('Loreal.jpg');%Retrieve order 1 and digitize it.
item1Grey = rgb2gray(item1);%convert to grayscale, 2 dimensional matrix
item1KP = detectSURFFeatures(item1Grey,'MetricThreshold',600);%get SURF dectectors or interest points
strong1 = item1KP.selectStrongest(50);
[item1Features, item1Points] = extractFeatures(item1Grey, strong1,'SURFSize',128); % using SURFSize of 128
%INPUT : Aquire Image
input= imread('MakeUp1.jpg');%Retrieve input and digitize it.
inputGrey = rgb2gray(input);%convert to grayscale, 2 dimensional matrix
inputKP = detectSURFFeatures(inputGrey,'MetricThreshold',600);%get SURF dectectors or interest
strongInput = inputKP.selectStrongest(50);
[inputFeatures, inputPoints] = extractFeatures(inputGrey, strongInput,'SURFSize',128); % using SURFSize of 128
pairs = matchFeatures(item1Features, inputFeatures, 'MaxRatio',1); %matching SURF Features
totalFeatures = length(item1Features); %baseline number of features
numPairs = length(pairs); %the number of pairs
percentage = numPairs/50;
if percentage >= 0.49
disp('We have this');
else
disp('We do not have this');
disp(percentage);
end
The baseline image
The input image
I would try not doing selectStrongest and not setting MaxRatio. Just call matchFeatures with the default options and compare the number of resulting matches.
The default behavior of matchFeatures is to use the ratio test to exclude ambiguous matches. So the number of matches it returns may be a good indicator of the presence or absence of the object in the scene.
If you want to try something more sophisticated, take a look at this example.

Matlab - Dilation function alternative

I'm looking through various online sources trying to learn some new stuff with matlab.
I can across a dilation function, shown below:
function rtn = dilation(in)
h =size(in,1);
l =size(in,2);
rtn = zeros(h,l,3);
rtn(:,:,1)=[in(2:h,:); in(h,:)];
rtn(:,:,2)=in;
rtn(:,:,3)=[in(1,:); in(1:h-1,:)];
rtn_two = max(rtn,[],3);
rtn(:,:,1)=[rtn_two(:,2:l), rtn_two(:,l)];
rtn(:,:,2)=rtn_two;
rtn(:,:,3)=[rtn_two(:,1), rtn_two(:,1:l-1)];
rtn = max(rtn,[],3);
The parameter it takes is: max(img,[],3) %where img is an image
I was wondering if anyone could shed some light on what this function appears to do and if there's a better (or less confusing way) to do it? Apart from a small wiki entry, I can't seem to find any documentation, hence asking for your help.
Could this be achieved with the imdilate function maybe?
What this is doing is creating two copies of the image shifted by one pixel up/down (with the last/first row duplicated to preserve size), then taking the max value of the 3 images at each point to create a vertically dilated image. Since the shifted copies and the original are layered in a 3-d matrix, max(img,[],3) 'flattens' the 3 layers along the 3rd dimension. It then repeats this column-wise for the horizontal part of the dilation.
For a trivial image:
00100
20000
00030
Step 1:
(:,:,1) (:,:,2) (:,:,3) max
20000 00100 00100 20100
00030 20000 00100 20130
00030 00030 20000 20030
Step 2:
(:,:,1) (:,:,2) (:,:,3) max
01000 20100 22010 22110
01300 20130 22013 22333
00300 20030 22003 22333
You're absolutely correct this would be simpler with the Image Processing Toolbox:
rtn = imdilate(in, ones(3));
With the original code, dilating by more than one pixel would require multiple iterations, and because it operates one dimension at a time it's limited to square (or possibly rectangular, with a bit of modification) structuring elements.
Your function replaces each element with the maximum value among the corresponding 3*3 kernel. By creating a 3D matrix, the function align each element with two of its shift, thus equivalently achieves the 3*3 kernel. Such alignment was done twice to find the maximum value along each column and row respectively.
You can generate a simple matrix to compare the result with imdilate:
a=magic(8)
rtn = dilation(a)
b=imdilate(a,ones(3))
Besides imdilate, you can also use
c=ordfilt2(a,9,ones(3))
to get the same result ( implements a 3-by-3 maximum filter. )
EDIT
You may have a try on 3D image with imdilate as well:
a(:,:,1)=magic(8);
a(:,:,2)=magic(8);
a(:,:,3)=magic(8);
mask = true(3,3,3);
mask(2,2,2) = false;
d = imdilate(a,mask);