ARKit project point with previous device position - swift

I'm combining ARKit with a CNN to constantly update ARKit nodes when they drift. So:
Get estimate of node position with ARKit and place a virtual object in the world
Use CNN to get its estimated 2D location of the object
Update node position accordingly (to refine it's location in 3D space)
The problem is that #2 takes 0,3s or so. Therefore I can't use sceneView.unprojectPoint because the point will correspond to a 3D point from the device's world position from #1.
How do I calculate the 3D vector from my old location to the CNN's 2D point?

unprojectPoint is just a matrix-math convenience function similar to those found in many graphics-oriented libraries (like DirectX, old-style OpenGL, Three.js, etc). In SceneKit, it's provided as a method on the view, which means it operates using the model/view/projection matrices and viewport the view currently uses for rendering. However, if you know how that function works, you can implement it yourself.
An Unproject function typically does two things:
Convert viewport coordinates (pixels) to the clip-space coordinate system (-1.0 to 1.0 in all directions).
Reverse the projection transform (assuming some arbitrary Z value in clip space) and the view (camera) transform to get to 3D world-space coordinates.
Given that knowledge, we can build our own function. (Warning: untested.)
func unproject(screenPoint: float3, // see below for Z depth hint discussion
modelView: float4x4,
projection: float4x4,
viewport: CGRect) -> float3 {
// viewport to clip: subtract viewport origin, divide by size,
// scale/offset from 0...1 to -1...1 coordinate space
let clip = (screenPoint - float3(viewport.x, viewport.y, 1.0))
/ float3(viewport.width, viewport.height, 1.0)
* float3(2) - float3(1)
// apply the reverse of the model-view-projection transform
let inversePM = (projection * modelView).inverse
let result = inversePM * float4(clip.x, clip.y, clip.z, 1.0)
return float3(result.x, result.y, result.z) / result.w // perspective divide
}
Now, to use it... The modelView matrix you pass to this function is the inverse of ARCamera.transform, and you can also get projectionMatrix directly from ARCamera. So, if you're grabbing a 2D position at one point in time, grab the camera matrices then, too, so that you can work backward to 3D as of that time.
There's still the issue of that "Z depth hint" I mentioned: when the renderer projects 3D to 2D it loses information (one of those D's, actually). So you have to recover or guess that information when you convert back to 3D — the screenPoint you pass in to the above function is the x and y pixel coordinates, plus a depth value between 0 and 1. Zero is closer to the camera, 1 is farther away. How you make use of that sort of depends on how the rest of your algorithm is designed. (At the very least, you can unproject both Z=0 and Z=1, and you'll get the endpoints of line segment in 3D, with your original point somewhere along that line.)
Of course, whether this can actually be put together with your novel CNN-based approach is another question entirely. But at least you learned some useful 3D graphics math!

Related

Size of 3D models in AR

I am trying to place large 3D models (SCNNode) in ARSCNView using ARKit.
The approximate size is as follows :
I have been through the following links :
is-there-any-size-limitations-on-3d-files-loaded-using-arkit
load-large-3d-object-scn-file-in-arscnview-aspect-fit-in-to-the-screen-arkit-sw
As per the above link, upvoted answer by alex papa, the model gets placed in scene. But the model seems above ground hanging in air. The 3D object seems floating in air and not placed on detected/tapped horizontal plane using hit test.
The x & z position is right but y seems some meters above the horizontal plane.
I need scale to be 1.0. Without scaling down the 3D model is it possible to place / visualise it right?
Any help or leads will be of help. Please provide valuable inputs!
The scale of ARKit, SceneKit and RealityKit is meters. Hence, your model's size is 99m X 184m X 43m. Solution is simple – you need to take one 100th of the nominal scale:
let scaleFactor: Float = 0.01
node.scale = SCNVector3(scaleFactor, scaleFactor, scaleFactor)
And here you can read about positioning of pivot point.

(UNITY) Plane not rotating to normal vector of three points?

I am trying to get a stretched out cube (which we can call a plane for the sake of discussion) to orient itself to the normal vector of a plane described by three points. I wrote a script to find the normal of three points, and then used transform.LookAt to have the planes align. However, I am finding that this script is not working at all how it is intended to and despite my best efforts I can not figure out why.
drastic movements of the individual points hardly effect the planes rotation.
the rotation of the object when using the existing points in the script should be 0,0,0 in the inspector. However, it is always off by a few degrees and as i said does not align itself when I move the points around.
This is the script. I can also post photos showing the behavior or share a small unity package
First of all Transform.LookAt takes a position as parameter, not a direction!
And then it
Rotates the transform so the forward vector points at worldPosition.
Doesn't sound like what you are trying to achieve.
If you want your object to look with its forward vector in the given normal direction (assuming you are calculating the normal correctly) then you could rather use Quaternion.LookRotation
transform.rotation = Quaternion.LookRotation(doNormal(cpit, cmit, ctht);
alternatively to this you can also simply assign the according vector directly like e.g.
transform.forward = doNormal(cpit, cmit, ctht);
or
transform.up = doNormal(cpit, cmit, ctht);
depending on your needs

mathematical movable mesh in swift with SceneKit

I am a mathematician who wants to program a geometric game.
I have the exact coordinates, and math formulae, of a few meshes I need to display and of their unit normals.
I need only one texture (colored reflective metal) per mesh.
I need to have the user move pieces, i.e. change the coordinates of a mesh, again by a simple math formula.
So I don't need to import 3D files, but rather I can compute everything.
Imagine a kind of Rubik cube. Cube coordinates are computed, and cubelets are rotated by the user. I have the program functioning in Mathematica.
I am having a very hard time, for sleepless days now, trying to find exactly how to display a computed mesh in SceneKit - with each vertex and normal animated separately.
ANY working example of, say, a single triangle with computed coordinates (rather than a stock provided shape), displayed with animatable coordinates by SceneKit would be EXTREMELY appreciated.
I looked more, and it seems that individual points of a mesh may not be movable in SceneKit. I like from SceneKit (unlike OpenGL) the feature that one can get the objects under the user's finger. Can one mix together OpenGL and SceneKit in a project?
I could take over from there....
Animating vertex positions individually is, in general, a tricky problem. But there are good ways to approach it in SceneKit.
A GPU really wants to have vertex data all uploaded in one chunk before it starts rendering a frame. That means that if you're continually calculating new vertex positions/normals/etc on the CPU, you have the problem of schlepping all that data over to the GPU every time even just part of it changes.
Because you're already describing your surface mathematically, you're in a good position to do that work on the GPU itself. If each vertex position is a function of some variable, you can write that function in a shader, and find a way to pass the input variable per vertex.
There are a couple of options you could look at for this:
Shader modifiers. Start with a dummy geometry that has the topology you need (number of vertices & how they're connected as polygons). Pass your input variable as an extra texture, and in your shader modifier code (for the geometry entry point), lookup the texture, do your function, and set the vertex position with the result.
Metal compute shaders. Create a geometry source backed by a Metal buffer, then at render time, enqueue a compute shader that writes vertex data to that buffer according to your function. (There's skeletal code for part of that at the link.)
Update: From your comments it sounds like you might be in an easier position.
If what you have is geometry composed of pieces that are static with respect to themselves and move with respect to each other — like the cubelets of a Rubik's cube — computing vertices at render time is overkill. Instead, you can upload the static parts of your geometry to the GPU once, and use transforms to position them relative to each other.
The way to do this in SceneKit is to create separate nodes, each with its own (static) geometry for each piece, then set node transforms (or positions/orientations/scales) to move the nodes relative to one another. To move several nodes at once, use node hierarchy — make several of them children of another node. If some need to move together at one moment, and a different subset need to move together later, you can change the hierarchy.
Here's a concrete example of the Rubik's cube idea. First, creating some cubelets:
// convenience for creating solid color materials
func materialWithColor(color: NSColor) -> SCNMaterial {
let mat = SCNMaterial()
mat.diffuse.contents = color
mat.specular.contents = NSColor.whiteColor()
return mat
}
// create and arrange a 3x3x3 array of cubelets
var cubelets: [SCNNode] = []
for x in -1...1 {
for y in -1...1 {
for z in -1...1 {
let box = SCNBox()
box.chamferRadius = 0.1
box.materials = [
materialWithColor(NSColor.greenColor()),
materialWithColor(NSColor.redColor()),
materialWithColor(NSColor.blueColor()),
materialWithColor(NSColor.orangeColor()),
materialWithColor(NSColor.whiteColor()),
materialWithColor(NSColor.yellowColor()),
]
let node = SCNNode(geometry: box)
node.position = SCNVector3(x: CGFloat(x), y: CGFloat(y), z: CGFloat(z))
scene.rootNode.addChildNode(node)
cubelets += [node]
}
}
}
Next, the process of doing a rotation. This is one specific rotation, but you could generalize this to a function that does any transform of any subset of the cubelets:
// create a temporary node for the rotation
let rotateNode = SCNNode()
scene.rootNode.addChildNode(rotateNode)
// grab the set of cubelets whose position is along the right face of the puzzle,
// and add them to the rotation node
let rightCubelets = cubelets.filter { node in
return abs(node.position.x - 1) < 0.001
}
rightCubelets.map { rotateNode.addChildNode($0) }
// animate a rotation
SCNTransaction.begin()
SCNTransaction.setAnimationDuration(2)
rotateNode.eulerAngles.x += CGFloat(M_PI_2)
SCNTransaction.setCompletionBlock {
// after animating, remove the cubelets from the rotation node,
// and re-add them to the parent node with their transforms altered
rotateNode.enumerateChildNodesUsingBlock { cubelet, _ in
cubelet.transform = cubelet.worldTransform
cubelet.removeFromParentNode()
scene.rootNode.addChildNode(cubelet)
}
rotateNode.removeFromParentNode()
}
SCNTransaction.commit()
The magic part is in the cleanup after the animation. The cubelets start out as children of the scene's root node, and we temporarily re-parent them to another node so we can transform them together. Upon returning them to be the root node's children again, we set each one's local transform to its worldTransform, so that it keeps the effect of the temporary node's transform changes.
You can then repeat this process to grab whatever set of nodes are in a (new) set of world space positions and use another temporary node to transform those.
I'm not sure quite how Rubik's-cube-like your problem is, but it sounds like you can probably generalize a solution from something like this.

3D trajectory reconstruction from video (taken by a single camera)

I am currently trying to reconstruct a 3D trajectory of a falling object like a ball or a rock out of a sequence of images taken from an iPhone video.
Where should I start looking? I know I have to calibrate the camera (I think I'll use the matlab calibration toolbox by Jean-Yves Bouguet) and then find the vanishing point from the same sequence, but then I'm really stuck.
read this: http://www.cs.auckland.ac.nz/courses/compsci773s1c/lectures/773-GG/lectA-773.htm
it explains 3d reconstruction using two cameras. Now for a simple summary, look at the figure from that site:
You only know pr/pl, the image points. By tracing a line from their respective focal points Or/Ol you get two lines (Pr/Pl) that both contain the point P. Because you know the 2 cameras origin and orientation, you can construct 3d equations for these lines. Their intersection is thus the 3d point, voila, it's that simple.
But when you discard one camera (let's say the left one), you only know for sure the line Pr. What's missing is depth. Luckily you know the radius of your ball, this extra information can give you the missing depth information. see next figure (don't mind my paint skills):
Now you know the depth using the intercept theorem
I see one last issue: the shape of ball changes when projected under an angle (ie not perpendicular on your capture plane). However you do know the angle, so compensation is possible, but I leave that up to you :p
edit: #ripkars' comment (comment box was too small)
1) ok
2) aha, the correspondence problem :D Typically solved by correlation analysis or matching features (mostly matching followed by tracking in a video). (other methods exist too)
I haven't used the image/vision toolbox myself, but there should definitely be some things to help you on the way.
3) = calibration of your cameras. Normally you should only do this once, when installing the cameras (and every other time you change their relative pose)
4) yes, just put the Longuet-Higgins equation to work, ie: solve
P = C1 + mu1*R1*K1^(-1)*p1
P = C2 + mu2*R2*K2^(-1)*p2
with
P = 3D point to find
C = camera center (vector)
R = rotation matrix expressing the orientation of the first camera in the world frame.
K = calibration matrix of the camera (containing internal parameters of the camera, not to be confused with the external parameters contained by R and C)
p1 and p2 = the image points
mu = parameter expressing the position of P on the projection line from camera center C to P (if i'm correct R*K^-1*p expresses a line equation/vector pointing from C to P)
these are 6 equations containing 5 unknowns: mu1, mu2 and P
edit: #ripkars' comment (comment box too small once again)
The only computer vison library that pops up in my mind is OpenCV (http://opencv.willowgarage.com/wiki ). But that's a C library, not matlab... I guess google is your friend ;)
About the calibration: yes, if those two images contain enough information to match some features. If you change the relative pose of the cameras, you'll have to recalibrate of course.
The choice of the world frame is arbitrary; it only becomes important when you want to analyze the retrieved 3d data afterwards: for example you could align one of the world planes with the plane of motion -> simplified motion equation if you want to fit one.
This world frame is just a reference frame, changeable with a 'change of reference frame transformation' (translation and/or rotation transformation)
Unless you have a stereo camera, you will never be able to know the position for sure, even with calibrated camera. Because you don't know whether the ball is small and close or large and far away.
There are other methods with single camera, based on series of images with different focus. But I doubt that you can control the camera of your cell phone in that way.
Edit(1):
as #GuntherStruyf points out correctly, you can know the position if one of your inputs is the size of the ball.

Screen-to-World coordinate conversion in OpenGLES an easy task?

The Screen-to-world problem on the iPhone
I have a 3D model (CUBE) rendered in an EAGLView and I want to be able to detect when I am touching the center of a given face (From any orientation angle) of the cube. Sounds pretty easy but it is not...
The problem:
How do I accurately relate screen-coordinates (touch point) to world-coordinates (a location in OpenGL 3D space)? Sure, converting a given point into a 'percentage' of the screen/world-axis might seem the logical fix, but problems would arise when I need to zoom or rotate the 3D space. Note: rotating & zooming in and out of the 3D space will change the relationship of the 2D screen coords with the 3D world coords...Also, you'd have to allow for 'distance' in between the viewpoint and objects in 3D space. At first, this might seem like an 'easy task', but that changes when you actually examine the requirements. And I've found no examples of people doing this on the iPhone. How is this normally done?
An 'easy' task?:
Sure, one might undertake the task of writing an API to act as a go-between between screen and world, but the task of creating such a framework would require some serious design and would likely take 'time' to do -- NOT something that can be one-manned in 4 hours...And 4 hours happens to be my deadline.
The question:
What are some of the simplest ways to
know if I touched specific locations
in 3D space in the iPhone OpenGL ES
world?
You can now find gluUnProject in http://code.google.com/p/iphone-glu/. I've no association with the iphone-glu project and haven't tried it yet myself, just wanted to share the link.
How would you use such a function? This PDF mentions that:
The Utility Library routine gluUnProject() performs this reversal of the transformations. Given the three-dimensional window coordinates for a location and all the transformations that affected them, gluUnProject() returns the world coordinates from where it originated.
int gluUnProject(GLdouble winx, GLdouble winy, GLdouble winz,
const GLdouble modelMatrix[16], const GLdouble projMatrix[16],
const GLint viewport[4], GLdouble *objx, GLdouble *objy, GLdouble *objz);
Map the specified window coordinates (winx, winy, winz) into object coordinates, using transformations defined by a modelview matrix (modelMatrix), projection matrix (projMatrix), and viewport (viewport). The resulting object coordinates are returned in objx, objy, and objz. The function returns GL_TRUE, indicating success, or GL_FALSE, indicating failure (such as an noninvertible matrix). This operation does not attempt to clip the coordinates to the viewport or eliminate depth values that fall outside of glDepthRange().
There are inherent difficulties in trying to reverse the transformation process. A two-dimensional screen location could have originated from anywhere on an entire line in three-dimensional space. To disambiguate the result, gluUnProject() requires that a window depth coordinate (winz) be provided and that winz be specified in terms of glDepthRange(). For the default values of glDepthRange(), winz at 0.0 will request the world coordinates of the transformed point at the near clipping plane, while winz at 1.0 will request the point at the far clipping plane.
Example 3-8 (again, see the PDF) demonstrates gluUnProject() by reading the mouse position and determining the three-dimensional points at the near and far clipping planes from which it was transformed. The computed world coordinates are printed to standard output, but the rendered window itself is just black.
In terms of performance, I found this quickly via Google as an example of what you might not want to do using gluUnProject, with a link to what might lead to a better alternative. I have absolutely no idea how applicable it is to the iPhone, as I'm still a newb with OpenGL ES. Ask me again in a month. ;-)
You need to have the opengl projection and modelview matrices. Multiply them to gain the modelview projection matrix. Invert this matrix to get a matrix that transforms clip space coordinates into world coordinates. Transform your touch point so it corresponds to clip coordinates: the center of the screen should be zero, while the edges should be +1/-1 for X and Y respectively.
construct two points, one at (0,0,0) and one at (touch_x,touch_y,-1) and transform both by the inverse modelview projection matrix.
Do the inverse of a perspective divide.
You should get two points describing a line from the center of the camera into "the far distance" (the farplane).
Do picking based on simplified bounding boxes of your models. You should be able to find ray/box intersection algorithms aplenty on the web.
Another solution is to paint each of the models in a slightly different color into an offscreen buffer and reading the color at the touch point from there, telling you which brich was touched.
Here's source for a cursor I wrote for a little project using bullet physics:
float x=((float)mpos.x/screensize.x)*2.0f -1.0f;
float y=((float)mpos.y/screensize.y)*-2.0f +1.0f;
p2=renderer->camera.unProject(vec4(x,y,1.0f,1));
p2/=p2.w;
vec4 pos=activecam.GetView().col_t;
p1=pos+(((vec3)p2 - (vec3)pos) / 2048.0f * 0.1f);
p1.w=1.0f;
btCollisionWorld::ClosestRayResultCallback rayCallback(btVector3(p1.x,p1.y,p1.z),btVector3(p2.x,p2.y,p2.z));
game.dynamicsWorld->rayTest(btVector3(p1.x,p1.y,p1.z),btVector3(p2.x,p2.y,p2.z), rayCallback);
if (rayCallback.hasHit())
{
btRigidBody* body = btRigidBody::upcast(rayCallback.m_collisionObject);
if(body==game.worldBody)
{
renderer->setHighlight(0);
}
else if (body)
{
Entity* ent=(Entity*)body->getUserPointer();
if(ent)
{
renderer->setHighlight(dynamic_cast<ModelEntity*>(ent));
//cerr<<"hit ";
//cerr<<ent->getName()<<endl;
}
}
}
Imagine a line that extends from the viewer's eye
through the screen touch point into your 3D model space.
If that line intersects any of the cube's faces, then the user has touched the cube.
Two solutions present themselves. Both of them should achieve the end goal, albeit by a different means: rather than answering "what world coordinate is under the mouse?", they answer the question "what object is rendered under the mouse?".
One is to draw a simplified version of your model to an off-screen buffer, rendering the center of each face using a distinct color (and adjusting the lighting so color is preserved identically). You can then detect those colors in the buffer (e.g. pixmap), and map mouse locations to them.
The other is to use OpenGL picking. There's a decent-looking tutorial here. The basic idea is to put OpenGL in select mode, restrict the viewport to a small (perhaps 3x3 or 5x5) window around the point of interest, and then render the scene (or a simplified version of it) using OpenGL "names" (integer identifiers) to identify the components making up each face. At the end of this process, OpenGL can give you a list of the names that were rendered in the selection viewport. Mapping these identifiers back to original objects will let you determine what object is under the mouse cursor.
Google for opengl screen to world (for example there’s a thread where somebody wants to do exactly what you are looking for on GameDev.net). There is a gluUnProject function that does precisely this, but it’s not available on iPhone, so that you have to port it (see this source from the Mesa project). Or maybe there’s already some publicly available source somewhere?