Can the coordinate of any point that a person looks at be determined from the photo of his eyes? - matlab

I am planning to develop a program in MATLAB/Python that can detect the coordinate of any point in space that a person is looking at. I was hoping to do it by placing a headset with a camera installed before eyes and tracking the pupil of the both eyes. I still don't know if it is possible. That's why I need help.

This is a challenging task.
To some extent, you can determine the outline of the pupils, and a little of the iris, by image processing. Then fitting an ellipse to these and a little geometry from the ellipse axis gives you an approximate direction of gaze. Then drawing two rays from the eyes and following these directions will give you a point of convergence.
The coordinates you will obtain will be relative to the image plane, i.e. to the position of the camera, which will move with the head.
Anyway, exact outlining of ellipses will be inaccurate, the rays will fail to cross exactly, cornea refraction and perspective will alter the measurements, the glasses will move a little, and so on.

Related

Moving around the surface of an Earth shaped spheroid in Unity

I'm trying to make a Unity game that allows the user to explore the surface of an Earth shaped spheroid, based on WGS84.
The project so far is on Github, and there's a YouTube video of this behaviour.
A shape the size of Earth is way too big for Unity, so I just spawn tiles near the user, offset so that the first tile is at Unity's origin point. This bit works.
The issue is moving around. I've been using an approach where I get the user's position in ECEF coordinates, then normalise that to provide the global orientation for the player, then I translate the player forward based on that and their rotation.
The issue with this is that normalising the ECEF coordinate means that the player is moving in a spherical shape, but the WGS84 spheroid is not perfectly spherical. So the player sinks into the floor, or flies up if you got south or north, respectively.
My question is, how can I allow the user to move around the surface of the spheroid by way of translation? I feel like there might be some way of taking the major/minor axis of the spheroid into account as the player moves, but I'm not sure how to do that.
I have no experience with Unity or computer graphics, I'm approaching it purely from the navigation point of view.
Let's look at the real world.
We want to travel either by walking/driving on the surface or flying at some altitude. When we do it, we move in the local coordinate system (North-South, East-West, Up-Down), we can't see any curvature. We assume the Earth is flat.
The problem arises when we try to do it on a computer, which is ruthlessly precise and knows the shape of the Earth. We can't assume the Earth is flat, we can't assume the Earth is a sphere. The Earth is a geoid. Fortunately for some purposes we can simplify things and assume the Earth is an ellipsoid. You chose WGS84. Good!
So how to move around an ellipsoid? Solving the problem analitically is a nightmare. We have to cheat ;)
We should assume te Earth is flat for a moment, make a move in a chosen direction in the local coordinate system, write down the altitude of the new position, calculate the global geodetic coordinates (Lat, Long, Alt) of that new point and then replace the altitude with the one obtained while using the local coordinate system. In other words: each time we move forward along a perfectly straight line and diverge from the ellipsoid (just a tiny bit), we force the altitude not to change in relation to the ellipsoid.
Implementation.
You need to be able to freely translate coordinates between geodetic (Lat, Long, Alt) and ECEF. Going from geodetic to ECEF is easy. Finding geodetic coordinates for a given ECEF position is much more complex, there are many different algorithms, I'm sure you should be able to find a ready to use implementation somewhere.
What you also need is Local Tangent Plane, and to be precise, you are going to use NED.
Let's assume your object is initially at some geodetic position. You write down the altitude (relative to the ellipsoid). Then you create a local NED coordinate system with its origin at that point. Then you move the object in that local coordinate system. You write down how much the altitude (or rather the Down coordinate) changed. Then you must calculate the ECEF coordinates of that new position and transform it to geodetic (Lat, Long, Alt). You have the old altitude, you have the altitude change in the NED coordinates, which means you know the new altitude. You then apply that altitude to your new geodetic coordinates (brutally replace the Alt in Lat/Long/Alt with a new value).
Then you make another move in the NED coordinates defined for that new position. And so on...
I'm not sure if it is clear, the process is quite complicated. If you can't understand - shout :)

Restoring the image of a face's plane

I am using an API to analyze faces in Matlab, where I get for each picture a 3X3 rotation matrix of the face's orientation, telling which direction the head is pointing.
I am trying to normalize the image according to that matrix, so that it will be distorted to get the image of the face's plane. This is something like 'undoing' the projection of the face to the camera plane. For example, if the head is directed a little to the left, it will stretch the left side to (more or less) preserve the face's original proportions.
Tried using 'affine2d' and 'projective2d' with 'imwarp', but it didn't achieve that goal
Achieving your goal with simple tools like affine transformations seems impossible to me since a face is hardly a flat surface. An extreme example: Imagine the camera recording a profile view of someone's head. How are you going to reconstruct the missing half of the face?
There have been successful attempts to change the orientation of faces in images and real-time video, but the methods used are quite complex:
[We] propose a gaze correction method that needs just a
single webcam. We apply recent shape deformation techniques
to generate a 3D face model that matches the user’s face. We
then render a gaze-corrected version of this face model and
seamlessly insert it into the original image.
(Giger et al., https://graphics.ethz.ch/Downloads/Publications/Papers/2014/Gig14a/Gig14a.pdf)

Open GL - ES 2.0 : Touch detection

Hi Guys I am doing some work on iOS and the work requires use of OpenGL es. So now I have a bunch of squares, cubes and triangles on the screen. Some of these geometries might overlap. Any ideas/ approaches for touch detection?
Regards
To follow up on the answer already given, squares, cubes and triangles are convex shapes so you can perform ray-object intersection quite easily, even directly from the geometry rather than from the mathematical description of the perfect object.
You're going to need to be able to calculate the distance of a point from the plane and the intersection of a ray with the plane. As a simple test you can implement yourself very quickly, for each polygon on the convex shape work out the intersection between the ray and the plane. Then check whether that point is behind all the planes defined by polygons that share an edge with the one you just tested. If so then the hit is on the surface of the object — though you should be careful about coplanar adjoining polygons and rounding errors.
Once you've found a collision you can easily get the length of the ray to the point of collision. The object with the shortest distance is the one that's in front.
If that's fast enough then great, otherwise you'll probably want to look into partitioning the world or breaking objects down to their silhouettes. Convex objects are really simple — consider all the edges that run between one polygon and the next. If only exactly one of those polygons is front facing then the edge is part of the silhouette. All the silhouettes edges together can be projected to a convex 2d shape on the view plane. You can then test touches by performing a 2d point-in-polygon from that.
A further common alternative that eliminates most of the maths is picking. You'd render the scene to an invisible buffer with each object appearing as a solid blob in a suitably unique colour. To test for touch, you'd just do a glReadPixels and inspect the colour.
For the purposes of glu on the iPhone, you can grab SGI's implementation (as used by MESA). I've used its tessellator in a shipping, production project before.
I had that problem in the past. What I have used is an implementation of glu unproject that you can find on google (it uses the inverse of the model view projection matrix and the viewport size). This allows you to map the 2D screen coordinates to a 3D vector into the world. Then, you can use this vector to intersect with your objects and see which one intersects (or comes really close to doing so).
I do hope there are better ways of doing this, so I look forward to other answers as well!
Once you get the inverse-modelview and cast your ray (vector), you still need to know if the ray intersects your geometry. One approach would be to grab the depth (z in view coordinate system) of the object's center and extend (stretch) your vector just that far. Then see if the vector's "head" ends within the volume of your object or not (you need the objects center and e.g. Its radius, if it's a sphere)

Is there a way to figure out 3D distance/view angle from a 2D environment using the iPhone/iPad camera?

Maybe I'm asking this too soon in my research, but I'd better know if this is possible sooner than later.
Imagine I have the following square printed on a paper on top of a table:
The table is brown, so it does not match with any of the colors in the square. Is there a way for me, from a common iPhone camera (non-stereo view), to figure out the distance and angle from which Im looking at the square in the table?
In the end what I'm looking for is being able to draw a 3D square on top of this one using the camera image, but I'm not sure if I am going to be able to figure out the distance and position of the object in space using only a 2D image. Any hints are well appreciated.
Short answer: http://weblog.bocoup.com/javascript-augmented-reality
Big answer:
First posterize, Then vectorize, With the vectors in your power you may need to do some math tricks to define, based on the vectors position, the perspective and then the camera position.
Maybe this help:
www.pixastic.com/lib/docs/actions/posterize/
github.com/selead/cl-vectorizer
vectormagic.com/home
autotrace.sourceforge.net
www.scipy.org/PyLab
raphaeljs.com/
technabob.com/blog/2007/12/29/video-games-get-vectorized/
superuser.com/questions/88415/is-there-an-open-source-alternative-to-vector-magic
Oughta be possible. Scan the image for the red/blue/yellow pattern, then do edge detection to figure out how warped the squares are (they'll be parallelograms in anything but straight-on view). Distance would depend on the camera's zoom setting and scan resolution. But basically you'd count how many pixels are visible in each of the squares, run that past the camera's specs and you should be able to determine a rough distance.

Screen-to-World coordinate conversion in OpenGLES an easy task?

The Screen-to-world problem on the iPhone
I have a 3D model (CUBE) rendered in an EAGLView and I want to be able to detect when I am touching the center of a given face (From any orientation angle) of the cube. Sounds pretty easy but it is not...
The problem:
How do I accurately relate screen-coordinates (touch point) to world-coordinates (a location in OpenGL 3D space)? Sure, converting a given point into a 'percentage' of the screen/world-axis might seem the logical fix, but problems would arise when I need to zoom or rotate the 3D space. Note: rotating & zooming in and out of the 3D space will change the relationship of the 2D screen coords with the 3D world coords...Also, you'd have to allow for 'distance' in between the viewpoint and objects in 3D space. At first, this might seem like an 'easy task', but that changes when you actually examine the requirements. And I've found no examples of people doing this on the iPhone. How is this normally done?
An 'easy' task?:
Sure, one might undertake the task of writing an API to act as a go-between between screen and world, but the task of creating such a framework would require some serious design and would likely take 'time' to do -- NOT something that can be one-manned in 4 hours...And 4 hours happens to be my deadline.
The question:
What are some of the simplest ways to
know if I touched specific locations
in 3D space in the iPhone OpenGL ES
world?
You can now find gluUnProject in http://code.google.com/p/iphone-glu/. I've no association with the iphone-glu project and haven't tried it yet myself, just wanted to share the link.
How would you use such a function? This PDF mentions that:
The Utility Library routine gluUnProject() performs this reversal of the transformations. Given the three-dimensional window coordinates for a location and all the transformations that affected them, gluUnProject() returns the world coordinates from where it originated.
int gluUnProject(GLdouble winx, GLdouble winy, GLdouble winz,
const GLdouble modelMatrix[16], const GLdouble projMatrix[16],
const GLint viewport[4], GLdouble *objx, GLdouble *objy, GLdouble *objz);
Map the specified window coordinates (winx, winy, winz) into object coordinates, using transformations defined by a modelview matrix (modelMatrix), projection matrix (projMatrix), and viewport (viewport). The resulting object coordinates are returned in objx, objy, and objz. The function returns GL_TRUE, indicating success, or GL_FALSE, indicating failure (such as an noninvertible matrix). This operation does not attempt to clip the coordinates to the viewport or eliminate depth values that fall outside of glDepthRange().
There are inherent difficulties in trying to reverse the transformation process. A two-dimensional screen location could have originated from anywhere on an entire line in three-dimensional space. To disambiguate the result, gluUnProject() requires that a window depth coordinate (winz) be provided and that winz be specified in terms of glDepthRange(). For the default values of glDepthRange(), winz at 0.0 will request the world coordinates of the transformed point at the near clipping plane, while winz at 1.0 will request the point at the far clipping plane.
Example 3-8 (again, see the PDF) demonstrates gluUnProject() by reading the mouse position and determining the three-dimensional points at the near and far clipping planes from which it was transformed. The computed world coordinates are printed to standard output, but the rendered window itself is just black.
In terms of performance, I found this quickly via Google as an example of what you might not want to do using gluUnProject, with a link to what might lead to a better alternative. I have absolutely no idea how applicable it is to the iPhone, as I'm still a newb with OpenGL ES. Ask me again in a month. ;-)
You need to have the opengl projection and modelview matrices. Multiply them to gain the modelview projection matrix. Invert this matrix to get a matrix that transforms clip space coordinates into world coordinates. Transform your touch point so it corresponds to clip coordinates: the center of the screen should be zero, while the edges should be +1/-1 for X and Y respectively.
construct two points, one at (0,0,0) and one at (touch_x,touch_y,-1) and transform both by the inverse modelview projection matrix.
Do the inverse of a perspective divide.
You should get two points describing a line from the center of the camera into "the far distance" (the farplane).
Do picking based on simplified bounding boxes of your models. You should be able to find ray/box intersection algorithms aplenty on the web.
Another solution is to paint each of the models in a slightly different color into an offscreen buffer and reading the color at the touch point from there, telling you which brich was touched.
Here's source for a cursor I wrote for a little project using bullet physics:
float x=((float)mpos.x/screensize.x)*2.0f -1.0f;
float y=((float)mpos.y/screensize.y)*-2.0f +1.0f;
p2=renderer->camera.unProject(vec4(x,y,1.0f,1));
p2/=p2.w;
vec4 pos=activecam.GetView().col_t;
p1=pos+(((vec3)p2 - (vec3)pos) / 2048.0f * 0.1f);
p1.w=1.0f;
btCollisionWorld::ClosestRayResultCallback rayCallback(btVector3(p1.x,p1.y,p1.z),btVector3(p2.x,p2.y,p2.z));
game.dynamicsWorld->rayTest(btVector3(p1.x,p1.y,p1.z),btVector3(p2.x,p2.y,p2.z), rayCallback);
if (rayCallback.hasHit())
{
btRigidBody* body = btRigidBody::upcast(rayCallback.m_collisionObject);
if(body==game.worldBody)
{
renderer->setHighlight(0);
}
else if (body)
{
Entity* ent=(Entity*)body->getUserPointer();
if(ent)
{
renderer->setHighlight(dynamic_cast<ModelEntity*>(ent));
//cerr<<"hit ";
//cerr<<ent->getName()<<endl;
}
}
}
Imagine a line that extends from the viewer's eye
through the screen touch point into your 3D model space.
If that line intersects any of the cube's faces, then the user has touched the cube.
Two solutions present themselves. Both of them should achieve the end goal, albeit by a different means: rather than answering "what world coordinate is under the mouse?", they answer the question "what object is rendered under the mouse?".
One is to draw a simplified version of your model to an off-screen buffer, rendering the center of each face using a distinct color (and adjusting the lighting so color is preserved identically). You can then detect those colors in the buffer (e.g. pixmap), and map mouse locations to them.
The other is to use OpenGL picking. There's a decent-looking tutorial here. The basic idea is to put OpenGL in select mode, restrict the viewport to a small (perhaps 3x3 or 5x5) window around the point of interest, and then render the scene (or a simplified version of it) using OpenGL "names" (integer identifiers) to identify the components making up each face. At the end of this process, OpenGL can give you a list of the names that were rendered in the selection viewport. Mapping these identifiers back to original objects will let you determine what object is under the mouse cursor.
Google for opengl screen to world (for example there’s a thread where somebody wants to do exactly what you are looking for on GameDev.net). There is a gluUnProject function that does precisely this, but it’s not available on iPhone, so that you have to port it (see this source from the Mesa project). Or maybe there’s already some publicly available source somewhere?