I want to make a haar cascade so that I can use it to detect a object in opencv-python.For eg, I want to detect a watch. I tried making a cascade using cascade trainer gui but it isn't giving me expected results.

Well, before training, search through the internet. Maybe the object you want to detect has already been trained, so you don't need to train again.
For example, you want to detect a watch. The haar-file is available here.
So I used the file whether it is working or not, the result is:
import cv2
w_cascade = cv2.CascadeClassifier('watchcascade10stage.xml')
cap = cv2.VideoCapture(0)
while True:
ret, img =
if ret:
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
w = w_cascade.detectMultiScale(image=gray,
for (x, y, w, h) in watches:
cv2.rectangle(img, (x, y), (x + w, y + h), (255, 255, 0), 2)
cv2.putText(img, 'Watch', (x - w, y - h), font, 0.5, (11, 255, 255), 2, cv2.LINE_AA)
cv2.imshow('img', img)
k = cv2.waitKey(0) & 0xff
if k == 27:
You can find other tutorial searching through the internet. For instance start with this video

So the thing is Haar Cascade is not a detector or even a classifier. It is a feature extractor IF you are going to use Haar Cascade you will use it in conjunction with SVM (support vector machines) for classification and then implement a sliding window to detect watches.
So the steps are a fallowed.
1 Extract a patch of images using sliding window.
2 pass it to SVM trained on Haar Cascade
3 Draw rect if prediction is true
I recommend this tutorial series do reach out to me if you still need help.


How to separate human body from background in an image

I have been trying to separate the human body in an image from the background, but all the methods I have seen don't seem to work very well for me.
I have collected the following images;
The image of the background
The image of the background with the person in it.
Now I want to cut out the person from the background.
I tried subtracting the image of the background from the image with the person using res = cv2.subtract(background, foreground) (I am new to image processing).
Background subtraction methods in opencv like cv2.BackgroundSubtractorMOG2() and cv2.BackgroundSubtractorMOG2() only works with videos or image sequence and contour detection methods I have seen are only for solid shapes.
And grabCut doesn't quite work well for me because I would like to automate the process.
Given the images I have (Image of the background and image of the background with the person in it), is there a method of cutting the person out from the background?
I wouldn't recommend a neural net for this problem. That's a lot of work for something like this where you have a known background. I'll walk through the steps I took to do the background segmentation on this image.
First I shifted into the LAB color space to get some light-resistant channels to work with. I did a simple subtractions of foreground and background and combined the a and b channels.
You can see that there is still significant color change in the background even with a less light-sensitive color channel. This is likely due to the auto white balance on the camera, you can see that some of the background colors change when you step into view.
The next step I took was thresholding off of this image. The optimal threshold values may not always be the same, you'll have to adjust to a range that works well for your set of photos.
I used openCV's findContours function to get the segmentation points of each blob and I filtered the available contours by size. I set a size threshold of 15000. For reference, the person in the image had a pixel area of 27551.
Then it's just a matter of cropping out the contour.
This technique works for any good thresholding strategy. If you can improve the consistency of your pictures by turning off auto settings and ensure good contrast of the person against the wall then you can use simpler thresholding strategies and get good results.
Just for fun:
I forgot to add in the code I used:
import cv2
import numpy as np
# rescale values
def rescale(img, orig, new):
img = np.divide(img, orig);
img = np.multiply(img, new);
img = img.astype(np.uint8);
return img;
# get abs(diff) of all hue values
def diff(bg, fg):
# do both sides
lh = bg - fg;
rh = fg - bg;
# pick minimum # this works because of uint wrapping
low = np.minimum(lh, rh);
return low;
# load image
bg = cv2.imread("back.jpg");
fg = cv2.imread("person.jpg");
fg_original = fg.copy();
# blur
bg = cv2.blur(bg,(5,5));
fg = cv2.blur(fg,(5,5));
# convert to lab
bg_lab = cv2.cvtColor(bg, cv2.COLOR_BGR2LAB);
fg_lab = cv2.cvtColor(fg, cv2.COLOR_BGR2LAB);
bl, ba, bb = cv2.split(bg_lab);
fl, fa, fb = cv2.split(fg_lab);
# subtract
d_b = diff(bb, fb);
d_a = diff(ba, fa);
# rescale for contrast
d_b = rescale(d_b, np.max(d_b), 255);
d_a = rescale(d_a, np.max(d_a), 255);
# combine
combined = np.maximum(d_b, d_a);
# threshold
# check your threshold range, this will work for
# this image, but may not work for others
# in general: having a strong contrast with the wall makes this easier
thresh = cv2.inRange(combined, 70, 255);
# opening and closing
kernel = np.ones((3,3), np.uint8);
# closing
thresh = cv2.dilate(thresh, kernel, iterations = 2);
thresh = cv2.erode(thresh, kernel, iterations = 2);
# opening
thresh = cv2.erode(thresh, kernel, iterations = 2);
thresh = cv2.dilate(thresh, kernel, iterations = 3);
# contours
_, contours, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE);
# filter contours by size
big_cntrs = [];
marked = fg_original.copy();
for contour in contours:
area = cv2.contourArea(contour);
if area > 15000:
cv2.drawContours(marked, big_cntrs, -1, (0, 255, 0), 3);
# create a mask of the contoured image
mask = np.zeros_like(fb);
mask = cv2.drawContours(mask, big_cntrs, -1, 255, -1);
# erode mask slightly (boundary pixels on wall get color shifted)
mask = cv2.erode(mask, kernel, iterations = 1);
# crop out
out = np.zeros_like(fg_original) # Extract out the object and place into output image
out[mask == 255] = fg_original[mask == 255];
# show
cv2.imshow("combined", combined);
cv2.imshow("thresh", thresh);
cv2.imshow("marked", marked);
# cv2.imshow("masked", mask);
cv2.imshow("out", out);
Since it is very easy to find dataset consist a lot of human body, I suggest you to implement neural network segmentation tecniques to extract human body perfectly. Please check this link to see similar example.

RGB Depth Alignment [duplicate]

I am trying to allign two images - one rgb and another depth using MATLAB. Please note that I have checked several places for this - like here , here which requires a kinect device, and here here which says that camera parameters are required for calibration. I was also suggested to use EPIPOLAR GEOMETRY to match the two images though I do not know how. The dataset I am referring to is given in rgb-d-t face dataset. One such example is illustrated below :
The ground truth which basically means the bounding boxes which specify the face region of interest are already provided and I use them to crop the face regions only. The matlab code is illustrated below :
I = imread('1.jpg');
I1 = imcrop(I,[218,198,158,122]);
I2 = imcrop(I,[243,209,140,108]);
figure, subplot(1,2,1),imshow(I1);
The two cropped images rgb and depth are shown below :
Is there any way by which we can register/allign the images. I took the hint from
here where basic sobel operator has been used on both the rgb and depth images to generate an edge map and then keypoints will need to be generated for matching purposes. The edge maps for both the images are generated here.
However they are so noisy that I do not think we will be able to do keypoint matching for this images.
Can anybody suggest some algorithms in matlab to do the same ?
This answer is based on mine previous answer:
Does Kinect Infrared View Have an offset with the Kinect Depth View
I manually crop your input image so I separate colors and depth images (as my program need them separated. This could cause minor offset change by few pixels. Also as I do not have the depths (depth image is 8bit only due to grayscale RGB) then the depth accuracy I work with is very poor see:
So my results are affected by all this negatively. Anyway here is what you need to do:
determine FOV for both images
So find some measurable feature visible on both images. The bigger in size the more accurate the result. For example I choose these:
form a point cloud or mesh
I use depth image as reference so my point cloud is in its FOV. As I do not have the distances but 8bit values instead I converted that to some distance by multiplying by constant. So I scan whole depth image and for every pixel I create point in my point cloud array. Then convert the dept pixel coordinate to color image FOV and copy its color too. something like this (in C++):
picture rgb,zed; // your input images
struct pnt3d { float pos[3]; DWORD rgb; pnt3d(){}; pnt3d(pnt3d& a){ *this=a; }; ~pnt3d(){}; pnt3d* operator = (const pnt3d *a) { *this=*a; return this; }; /*pnt3d* operator = (const pnt3d &a) { ...copy... return this; };*/ };
pnt3d **xyz=NULL; int xs,ys,ofsx=0,ofsy=0;
void copy_images()
int x,y,x0,y0;
float xx,yy;
pnt3d *p;
for (y=0;y<ys;y++)
for (x=0;x<xs;x++)
// copy point from depth image
// convert dept image x,y to color image space (FOV correction)
x0=xx; x0+=ofsx;
y0=yy; y0+=ofsy;
// copy color from rgb image if in range
p->rgb=0x00000000; // black
if ((x0>=0)&&(x0<rgb.xs))
if ((y0>=0)&&(y0<rgb.ys))
p->rgb=rgb2bgr(rgb.p[y0][x0].dd); // OpenGL has reverse RGBorder then my image
where **xyz is my point cloud 2D array allocated t depth image resolution. The picture is my image class for DIP so here some relevant members:
xs,ys is the image resolution in pixels
p[ys][xs] is the image direct pixel access as union of DWORD dd; BYTE db[4]; so I can access color as single 32 bit variable or each color channel separately.
rgb2bgr(DWORD col) just reorder color channels from RGB to BGR.
render it
I use OpenGL for this so here the code:
for (int y0=0,y1=1;y1<ys;y0++,y1++)
for (int x0=0,x1=1;x1<xs;x0++,x1++)
float z,z0,z1;
z=xyz[y0][x0].pos[2]; z0=z; z1=z0;
z=xyz[y0][x1].pos[2]; if (z0>z) z0=z; if (z1<z) z1=z;
z=xyz[y1][x0].pos[2]; if (z0>z) z0=z; if (z1<z) z1=z;
z=xyz[y1][x1].pos[2]; if (z0>z) z0=z; if (z1<z) z1=z;
if (z0 <=0.01) continue;
if (z1 >=3.90) continue; // 3.972 pre vsetko nad .=3.95m a 4.000 ak nechyti vobec nic
if (z1-z0>=0.10) continue;
glColor4ubv((BYTE* )&xyz[y0][x0].rgb);
glColor4ubv((BYTE* )&xyz[y0][x1].rgb);
glColor4ubv((BYTE* )&xyz[y1][x1].rgb);
glColor4ubv((BYTE* )&xyz[y1][x0].rgb);
You need to add the OpenGL initialization and camera settings etc of coarse. Here the unaligned result:
align it
If you notice I added ofsx,ofsy variables to copy_images(). This is the offset between cameras. I change them on arrows keystrokes by 1 pixel and then call copy_images and render the result. This way I manually found the offset very quickly:
As you can see the offset is +17 pixels in x axis and +4 pixels in y axis. Here side view to better see the depths:
Hope It helps a bit
Well I have tried doing it after reading lots of blogs and all. I am still not sure whether I am doing it correct or not. Please feel free to give comments if something is found amiss. For this I used a mathworks fex submission that can be found here : ginputc function.
The matlab code is as follows :
clc; clear all; close all;
% no of keypoint
N = 7;
I = imread('2.jpg');
I = rgb2gray(I);
[Gx, Gy] = imgradientxy(I, 'Sobel');
[Gmag, ~] = imgradient(Gx, Gy);
figure, imshow(Gmag, [ ]), title('Gradient magnitude')
I = Gmag;
[x,y] = ginputc(N, 'Color' , 'r');
matchedpoint1 = [x y];
J = imread('2.png');
[Gx, Gy] = imgradientxy(J, 'Sobel');
[Gmag, ~] = imgradient(Gx, Gy);
figure, imshow(Gmag, [ ]), title('Gradient magnitude')
J = Gmag;
[x, y] = ginputc(N, 'Color' , 'r');
matchedpoint2 = [x y];
[tform,inlierPtsDistorted,inlierPtsOriginal] = estimateGeometricTransform(matchedpoint2,matchedpoint1,'similarity');
figure; showMatchedFeatures(J,I,inlierPtsOriginal,inlierPtsDistorted);
title('Matched inlier points');
I = imread('2.jpg'); J = imread('2.png');
I = rgb2gray(I);
outputView = imref2d(size(I));
Ir = imwarp(J,tform,'OutputView',outputView);
figure; imshow(Ir, []);
title('Recovered image');
figure,imshowpair(I,J,'diff'),title('Difference with original');
figure,imshowpair(I,Ir,'diff'),title('Difference with restored');
Step 1
I used the sobel edge detector to extract the edges for both the depth and rgb images and then used a thresholding values to get the edge map. I will be primarily working with the gradient magnitude only. This gives me two images as this :
Step 2
Next I use the ginput or ginputc function to mark keypoints on both the images. The correspondence between the points are established by me beforehand. I tried using SURF features but they do not work well on depth images.
Step 3
Use the estimategeometrictransform to get the transformation matrix tform and then use this matrix to recover the original position of the moved image. The next set of images tells this story.
Granted I still believe the results can be further improved if the keypoint selections in either of the images are more judiciously done. I also think #Specktre method is better. I just noticed that I used a separate image-pair in my answer compared to that of the question. Both images come from the same dataset to be found here vap rgb-d-t dataset.

Specifying the type of the vehicule detected in a Video frame

I am trying to detect vehicules in a video frame. More specifically to detect vehicules and then count the number of the detected vehicules.
By the way, I am using the MATLAB code of MathWorks: Open this link
So you can find more details in the above link...
Assume that we extract a specific frame of a video. What I need, is to extend the code by adding more lines which have the ability to further specify the type of the detected vehicule (if it is a car or track? for example).
Concerning the original code used by Mathworks:
1) Import the video (to be processed) and initialize a Foreground Color Detector:
The motivation is to make the processing of the video more easy. So instead of processing the entire video, we can thus apply our processing in a frame in which all the moving objects are segmented from the background. The foreground detector requires a certain number of video frames in order to initialize the Gaussian mixture model. This example uses the first 50 frames to initialize three Gaussian modes in the mixture model.
foregroundDetector = vision.ForegroundDetector('NumGaussians', 3, ...
'NumTrainingFrames', 50);
videoReader = vision.VideoFileReader('visiontraffic.avi');
for i = 1:150
frame = step(videoReader); % read the next video frame
foreground = step(foregroundDetector, frame);
2) Detecting vehicule in the video frame:
Unfortunately, the Foreground color detector is not perfect since it provides some adding noises. So It will be interesting to implement the "morphological idea" in order to remove the added noise:
se = strel('square', 3);
filteredForeground = imopen(foreground, se);
figure; imshow(filteredForeground); title('Clean Foreground');
3) Next, we find bounding boxes of each connected component corresponding to a moving car by using vision.BlobAnalysis object. The object further filters the detected foreground by rejecting blobs which contain fewer than 150 pixels.
blobAnalysis = vision.BlobAnalysis('BoundingBoxOutputPort', true, ...
'AreaOutputPort', false, 'CentroidOutputPort', false, ...
'MinimumBlobArea', 150);
bbox = step(blobAnalysis, filteredForeground);
4) Lets highlight each detected vehicule by a small rectangular box:
result = insertShape(frame, 'Rectangle', bbox, 'Color', 'green');
5) Counting the number of vehicules that appear in the video frame:
numCars = size(bbox, 1);
result = insertText(result, [10 10], numCars, 'BoxOpacity', 1, ...
'FontSize', 14);
Kindly I will appreciate very much your help.
This problem is an active research area, and there are many possible approaches. One possibility is to train a classifier to distinguish a car from a truck. You can use this example showing how to classify digits using HOG features and an SVM classifier to get started.

Plot depth image in matlab

I have an RGB-D image and am trying to get a 3D visualization in matlab. Currently I am doing:
depth = imread('img_031_depth.png');
depth = double(depth);
img = imread('img_031.png');
surf(depth, img, 'FaceColor', 'texturemap', 'EdgeColor', 'none' )
view(158, 38)
Which gives me an image like:
I have two questions:
1) how can I save the image without it blurring as above
2) As you can see some edges show lined going to zero (e.g. the top of the coffee cup) I would like to remove these.
What I'm trying to produce is a 3D looking pointcloud, as these are only 2.5D I must show them from the right angle.
Any help is appreciated
EDIT: added images (note depth image needs to be normalized for visualization)
If you are only interested in a point cloud, you might want to consider scatter3.
You can select which points to plot (discard those with depth == 0).
You need to have explicit x-y coordinates though.
[y x] = ndgrid( 1:size(img,1), 1:size(img,2) );
sel = depth > 0 ; % which points to plot
% "flatten" the matrices for scatter plot
x = x(:);
y = y(:);
img = reshape( img, [], 3 );
depth = depth(:);
scatter3( x(sel), y(sel), depth(sel), 20, img( sel, : ), 'filled' );
view(158, 38)
Edit: sampled version
[y x] = ndgrid( 1:2:size(img,1), 1:2:size(img,2) );
sel = depth( 1:2:end, 1:2:end ) > 0;
x = x(:);
y = y(:);
img = reshape( img( 1:2:end, 1:2:end, : ), [], 3 );
depth = depth( 1:2:end, 1:2:end );
scatter( x(sel), y(sel), depth(sel), 20, img( sel, : ), 'filled' );
view( 158, 38 );
Alternatively, you can directly manipulate sel mask.
i suggest you first restore x=zu/f and y=zv/f, to obtain x, y, z, where f is your camera focal length;
then apply whatever rotation, translation you want before displaying them [x’,y’,z’] = R[x, y, z] + t;
then project them back using col = xf/z+w/2, row = h/2-yf/z to get a simple image that you can display fast; you can add a depth buffer to the last operation to guarantee
proper occlusions by writing depth at each pixel there and checking that repetitive writing happens only if new z is smaller (that is a new pixel is close to the viewer). The resulting image will still have holes due to the nature of point clouds. You can interpolate in those holes but this means you have to trace rays from every pixels in the image to your point cloud and find a closest neighbor to the ray which probably takes forever in Matlab.
I am also doing some 3D image restoring and reconstructing. The first question is easy. Your photo is taken by a camera. So you need to transform the position to camera coordinate system. In other words, you need to know some intrinsic value of your camera! Or you can never recover it with a single image. Google 'kinect intrinsic value' you can get the focal length etc.
Also, change your view.
Try this! And if it's not working, ask again.

iPhone colour Image analysis

I am looking for some ideas about an approach that will let me analyze an image, and determine how greenISH or brownISH or whiteISH it is... I am emphasizing ISH here because, I am interested in capturing ALL the shades of these colours. So far, I have done the following:
I have my UIImage, I have CGImageRef and I actually have the colour of the pixel itself (it's RGB and Alpha), what I don't know is how to quantify this, and determine all the green shades, blues, browns, yellows, purples etc... So, I can process each and every pixel, determine it's basic RGB, but I need some help in quantifying the colours it over a whole image.
Thanks for your ideas...
One fairly good solution is to switch from RGB colour space to one of the Y colour spaces, such as YUV, YCrCb or any of those. In all cases the Y channel represents brightness and the other two channels together represent colour, relative to brightness. You probably want to factor brightness out, possibly with the caveat that all colours below a certain darkness are to be excluded, so getting Y separately is a helpful first step in itself.
Converting from RGB to YUV is achieved with a simple linear combination. Straight from Wikipedia and a thousand other sources:
y = 0.299*r + 0.587*g + 0.114*b;
u = -0.14713*r - 0.28886*g + 0.436*b;
v = 0.615*r - 0.51499*g - 0.10001*b;
Assuming you're keeping r, g and b in the range [0, 1], your first test might be:
if(y < 0.05)
// this colour is very dark, so it's considered to be as
// far as we allow from any colour we're interested in
To decide how close your colour then is to, say, green, work out the u and v components of the green you're interested in, as a proportion of the y:
r = b = 0;
g = 0;
y = 0.299*r + 0.587*g + 0.114*b = 0.587;
u = -0.14713*r - 0.28886*g + 0.436*b = -0.28886;
v = 0.615*r - 0.51499*g - 0.10001*b = -0.51499;
proportionOfU = u / y = -2.0479;
proportionOfV = v / y = -0.8773;
Subsequently, work out and compare the proportions of U and V for incoming colours and compare (e.g. with 2d planar distance) to those you've computed for the colour you're comparing to. Closer values are more similar. How you scale and use that metric depends on your application.
Notice that as y goes toward 0, the computed proportions become increasingly less precise because of the limited range of the input data, and are undefined when y is 0. Conceptually that's because all colours look exactly the same when there's no light on them. Checking that y is above at least a certain minimum value is the pragmatic way of working around this issue. This also means that you're not going to get sensible results if you try to say "how black is this picture?", though again that's because of the ambiguity between a surface that doesn't reflect any light and a surface that doesn't have any light falling upon it.