How to draw a smile using Vision and its Face landmarks observations? - swift

I am trying to build a smile detector over a real time video (front cam) using Uibezierpath over the screen coordinates by detecting face landmarks using VNDetectFaceLandmarksRequest and "Landmarks.outerlips", calculating Y offset between upper points, without using CoreML ideally - but I seem only able to get the normalised points for the landmark, where these points have their own coordinate system. I'm not sure how to convert each point to the screen coordinate system.
This answer from #Rickster seems to be in the right direction but I'm not able to fully grasp next steps:
How to convert normalized points retrived from VNFaceLandmarkRegion2D
Current output:
Desired result:

Related

Is it possible to find the depth of an internal point of an object using stereo images (or any other method)?

I have image of robot with yellow markers as shown
The yellow points shown are the markers. There are two cameras used to view placed at an offset of 90 degrees. The robot bends in between the cameras. The crude schematic of the setup can be referred.
https://i.stack.imgur.com/aVyDq.png
Using the two cameras I am able to get its 3d co-ordinates of the yellow markers. But, I need to find the 3d-co-oridnates of the central point of the robot as shown.
I need to find the 3d position of the red marker points which is inside the cylindrical robot. Firstly, is it even feasible? If yes, what is the method I can use to achieve this?
As a bonus, is there any literature where they find the 3d location of such internal points which I can refer to (I searched, but could not find anything similar to my ask).
I am welcome to a theoretical solution as well(as long as it assures to find the central point within a reasonable error), which I can later translate to code.
If you know the actual dimensions, or at least, shape (e.g. perfect circle) of the white bands, then yes, it is feasible and possible.
You need to do the following steps, which are quite non trivial to do, and I won't do them here:
Optional but extremely suggested: calibrate your camera, and
undistort it.
find the equation of the projection of a 3D circle into a 2D camera, for any given rotation. You can simplify this by assuming the white line will be completely horizontal. You want some function that takes the parameters that make a circle and a rotation.
Find all white bands in the image, segment them, and make them horizontal (rotate them)
Fit points in the corrected white circle to the equation in (1). That should give you the parameters of the circle in 3d (radious, angle), if you wrote the equation right.
Now that you have an analytic equation of the actual circle (equation from 1 with parameters from 3), you can map any point from this circle (e.g. its center) to the image location. Remember to uncorrect for the rotations in step 2.
This requires understanding of curve fitting, some geometric analytical maths, and decent code skills. Not trivial, but this will provide a solution that is highly accurate.
For an inaccurate solution:
Find end points of white circles
Make line connecting endpoints
Chose center as mid point of this line.
This will be inaccurate because: choosing end points will have more error than fitting an equation with all points, ignores cone shape of view of the camera, ignores geometry.
But it may be good enough for what you want.
I have been able to extract the midpoint by fitting an ellipse to the arc visible to the camera. The centroid of the ellipse is the required midpoint.
There will be wrong ellipses as well, which can be ignored. The steps to extract the ellipse were:
Extract the markers
Binarise and skeletonise
Fit ellipse to the arc (found a matlab function for this)
Get the centroid of the ellipse
hsv_img=rgb2hsv(im);
bin=new_hsv_img(:,:,3)>marker_th; %was chosen 0.35
%skeletonise
skel=bwskel(bin);
%use regionprops to get the pixelID list
stats=regionprops(skel,'all');
for i=1:numel(stats)
el = fit_ellipse(stats(i).PixelList(:,1),stats(i).PixelList(:,2));
ellipse_draw(el.a, el.b, -el.phi, el.X0_in, el.Y0_in, 'g');
The link for fit_ellipse function
Link for ellipse_draw function

Measuring objects in a photo taken by calibrated cameras, knowing the size of a reference object in the photo

I am writing a program that captures real time images from a scene by two calibrated cameras (so the internal parameters of the cameras are known to us). Using two view geometry, I can find the essential matrix and use OpenCV or MATLAB to find the relative position and orientation of one camera with respect to another. Having the essential matrix, it is shown in Hartley and Zisserman's Multiple View Geometry that one can reconstruct the scene using triangulation up to scale. Now I want to use a reference length to determine the scale of reconstruction and resolve ambiguity.
I know the height of the front wall and I want to use it for determining the scale of reconstruction to measure other objects and their dimensions or their distance from the center of my first camera. How can it be done in practice?
Thanks in advance.
Edit: To add more information, I have already done linear trianglation (minimizing the algebraic error) but I am not sure if it is any useful because there is still a scale ambiguity that I don't know how to get rid of it. My ultimate goal is to recognize an object (like a Pepsi can) and separate it in a rectangular area (which is going to be written as a separate module by someone else) and then find the distance of each pixel in this rectangular area, i.e. the region of interest, to the camera. Then the distance from the camera to the object will be the minimum of the distances from the camera to the 3D coordinates of the pixels in the region of interest.
Might be a bit late, but at least for someone struggling with the same staff.
As far as I remember it is actually linear problem. You got essential matrix, which gives you rotation matrix and normalized translation vector specifying relative position of cameras. If you followed Hartley and Zissermanm you probably chose one of the cameras as origin of world coordinate system. Meaning all your triangulated points are in normalized distance from this origin. What is important is, that the direction of every triangulated point is correct.
If you have some reference in the scene (lets say height of the wall), then you just have to find this reference (2 points are enough - so opposite ends of the wall) and calculate "normalization coefficient" (sorry for terminology) as
coeff = realWorldDistanceOf2Points / distanceOfTriangulatedPoints
Once you have this coeff, just mulptiply all your triangulated points with it and you got real world points.
Example:
you know that opposite corners of the wall are 5m from each other. you find these corners in both images, triangulate them (lets call triangulated points c1 and c2), calculate their distance in the "normalized" world as ||c1 - c2|| and get the
coeff = 5 / ||c1 - c2||
and you get real 3d world points as triangulatedPoint*coeff.
Maybe easier option is to have both cameras in fixed relative position and calibrate them together by stereoCalibrate openCV/Matlab function (there is actually pretty nice GUI in Matlab for that) - it returns not just intrinsic params, but also extrinsic. But I don't know if this is your case.

Verify that camera calibration is still valid

How do you determine that the intrinsic and extrinsic parameters you have calculated for a camera at time X are still valid at time Y?
My idea would be
to use a known calibration object (a chessboard) and place it in the camera's field of view at time Y.
Calculate the chessboard corner points in the camera's image (at time Y).
Define one of the chessboard corner points as world origin and calculate the world coordinates of all remaining chessboard corners based on that origin.
Relate the coordinates of 3. with the camera coordinate system.
Use the parameters calculated at time X to calculate the image points of the points from 4.
Calculate distances between points from 2. with points from 5.
Is that a clever way to go about it? I'd eventually like to implement it in MATLAB and later possibly openCV. I think I'd know how to do steps 1)-2) and step 6). Maybe someone can give a rough implementation for steps 2)-5). Especially I'd be unsure how to relate the "chessboard-world-coordinate-system" with the "camera-world-coordinate-system", which I believe I would have to do.
Thanks!
If you have a single camera you can easily follow the steps from this article:
Evaluating the Accuracy of Single Camera Calibration
For achieving step 2, you can easily use detectCheckerboardPoints function from MATLAB.
[imagePoints, boardSize, imagesUsed] = detectCheckerboardPoints(imageFileNames);
Assuming that you are talking about stereo-cameras, for stereo pairs, imagePoints(:,:,:,1) are the points from the first set of images, and imagePoints(:,:,:,2) are the points from the second set of images. The output contains M number of [x y] coordinates. Each coordinate represents a point where square corners are detected on the checkerboard. The number of points the function returns depends on the value of boardSize, which indicates the number of squares detected. The function detects the points with sub-pixel accuracy.
As you can see in the following image the points are estimated relative to the first point that covers your third step.
[The image is from this page at MATHWORKS.]
You can consider point 1 as the origin of your coordinate system (0,0). The directions of the axes are shown on the image and you know the distance between each point (in the world coordinate), so it is just the matter of depth estimation.
To find a transformation matrix between the points in the world CS and the points in the camera CS, you should collect a set of points and perform an SVD to estimate the transformation matrix.
But,
I would estimate the parameters of the camera and compare them with the initial parameters at time X. This is easier, if you have saved the images that were used when calibrating the camera at time X. By repeating the calibrating process using those images you should get very similar results, if the camera calibration is still valid.
Edit: Why you need the set of images used in the calibration process at time X?
You have a set of images to do the calibrations for the first time, right? To recalibrate the camera you need to use a new set of images. But for checking the previous calibration, you can use the previous images. If the parameters of the camera are changes, there would be an error between the re-estimation and the first estimation. This can be used for evaluating the validity of the calibration not for recalibrating the camera.

eye position mapping with the screen pixel

I am currently doing a project called eye controlled cursor using MATLAB.
I have few stages before I extract out the center of the iris (which can be considered as a pupil location). face detetcion - > eye detection -- > iris detection -->And finally i have obtained the center of the iris as show in the figure.
Now, I am trying to map this position (X,Y) to my computer screen pixel (1366 x 768). In most of the journals I have found, they require a reference point such as lips, nose or eye corner. But I am only able to extract the center of iris by doing certain thresholding. How can i map this position (X,Y) to my computer screen pixel (1366 x 768)?
Well you either have to fix the head to a certain position (which isn't very practical) or you will have to adapt to the face position. Depending on your image, you will have to choose points that are always on that image and are easy to detect. If you just have one point (like the nose), you can only adjust for the x/y shift of your head. If you have more points (like the 4 corners of the eye, the nose, maybe the corners of the mouth), you can also extract the 3 rotational values of the head and therefore calculate the direction of sight much better. For a first approach, I guess only the two inner corners of the eye (they are "easy" to detect) will do.
I would also recommend using a calibration sequency. You present the user with a sequence of 4 red points in the corners of the screen and he has to look at them. You can then record the positions of the pupils and interpolate between them.

How to calculate ios accelerometer horizontal plane

I am working on an IOS AR project, now i 've done with Camera, GPS with altitude, Compass heading but i can not get the right vector of gravity to draw the right horizontal plane. Please help me with my problem.
- Calculate horizontal plane and draw on the camera view. (And with altitude will be so good)
- Maybe project wll help me a lot.
Please help.
Thanks you very much.
Record some filtered accelerometer samples while not moving the device.
Compute the average of all those samples to get the down vector.
Vectors perpendicular to it make up the horizontal planes. Taking the dot product of the down vector with a vector along the Z axis (0,0,1) would allow you to figure out the angle of the screen relative to the horizon (see accelerometer axes)
I haven't tried this, but that would be my approach... hope it helps somehow