Is there a term for explorable vr scene? - virtual-reality

Is there a tech community agreed term for a photographic (well as close as possible) scene that can be explored by walking around? Obviously, within certain limits. Say, a museum could scan a sculpture with laser and make it available on vr, 3d mesh with properly mapped textures. Is there a name for such thing? The so-called 360 VR photos definitely fall short of such detail.

I think the most common names are:
360 if it's just an image from one point containing all the angles, usually a equirectangular or cubemap texture/video. Some have stereoscopy, but it's very limited.
360 with depth it's a 360 but apart from color, it has depth information. This allows stereoscopy and some movement, but because of shadowing and problems with acquiring depth maps its almost never used. In the future AI-based filling of shadowed areas, and perhaps replacing the need for capturing depth, might make this a commonly used format.
photogrammetry if it's converted to a textured mesh, has proper depth and can be viewed from all angles (for example Vanishing of Ethan Carter - unfortunatelly 3d models from that article seem to be missing, sent them an email, maybe they'll fix it)
lightfield if it's a volume containing lots of 360 images with some kind of interpolation between them. Has proper depth but can be viewed only from the mapped volume (see Welcome To Lightfields)

Related

What is difference between “real-world object” , surface, AR anchors?

I am looking for a technical answer to the question :
What is difference between “real-world object” , surface, AR anchors in ARkit?
I believe and as for as I can tell:
1) ARkit offers 3 different methods to search for “real-world objects” , surfaces , AR Anchors.
ARSCNView hitTest(_:types:)
https://developer.apple.com/documentation/arkit/arscnview/2875544-hittest
ARSKView hitTest(_:types:)
https://developer.apple.com/documentation/arkit/arskview/2875733-hittest
ARFrame hitTest(_:types:)
https://developer.apple.com/documentation/arkit/arframe/2875718-hittest
I understand to look for SceneKit /SpriteKit content displayed in the view you need to use different hitTest methods.
I just can’t understand what is “real-world object” is vs surface vs AR anchor?
My best guess is:
Real-world object:
- I don’t know?
Surface:
- featurePoints
- estimatedHorizontalPlane
-estimatedVerticalPlane
AR anchors:
ARImageAnchor
ARFaceAnchor
ARPlaneAnchor
I think you get the idea.... what is a “real-world object” in ARKit?
Any help would be great. The documentation seem to really emphasize the difference between “real-works object or surface”.
Thank you
Smartdog
We all learn by sharing what we know
My intuition is that whoever wrote Apple’s docs kept things ambiguous because a) you can use those methods for multiple kinds of hit tests, and b) ARKit doesn’t really know what it’s looking at.
If you do a hit test for any of the plane-related types (existingPlane, estimatedHorizontalPlane, etc), you’re looking for real-world flat surfaces. Or rather, you’re looking for things that ARKit “thinks” look like flat horizontal (or vertical, in iOS 11.3 and later) surfaces. Those might or might not accurately reflect the shape of the real world at any given moment, but they’re ARKit’s best guess. Which of the plane-related types you search for determines whether you get an existing ARAnchor.
(Note false negatives are more common than false positives. For example, you might find no hit result at a point where a corner of tabletop hasn’t been mapped by ARKit, but you’re unlikely to find a plane hit result with no corresponding real-world flat surface.)
If you do a featurePoint hit test, you’re testing against ARKit’s sparse map of the user’s environment. If you turn on the showFeaturePoints option in ARSCNView you can see this map — in each frame of video, it finds tens to small-hundreds of points that are visually interesting enough (well, “interesting” from a particular algorithm’s point of view) that it can correlate their 2D positions between frames and use parallax differences to estimate their distances from the camera and positions in 3D space. (In turn, that informs ARKit’s idea of where the device itself is in 3D space.)
Because a “feature point” can be any small, high-contrast region in the camera image, it doesn’t really correlate to any specific kind of real-world thing. If you’re looking at a desk with a nice wood-grain pattern, you’ll see a lot of feature points along the plane of the desktop. If you’re looking at a desk with, say, a potted plant on it, you’ll see some points on the desktop and some on the pot and some on the leaves... not enough points that you (or a CV algorithm) can really intuit the shape of the plant. But enough that, if the user were to tap on one of those points, your app could put some 3D object there and it might convincingly appear to stick to the plant.
So, in the most general sense, ARKit hit testing is looking for “objects” of some sort in the “real world” (as perceived by ARKit), but unless you’re looking for planes (surfaces) one can’t really be more specific about what the “objects” might be.

Alternative to default Open GLES lines (3D)?

I'm currently trying to implement a silhouette algorithm in my project (using Open GLES, it's for mobile devices, primarily iPhone at the moment). One of the requirements is that a set of 3D lines be drawn. The issue with the default OpenGL lines is that they don't connect at an angle nicely when they are thick (gaps appear). Other subtle artifacts are also evident, which detract from the visual appeal of the lines.
Now, I have looked into using some sort of quad strip as an alternative to this. However, drawing a quad strip in screen space requires some sort of visibility detection - lines obscured in the actual 3D world should not be visible.
There are numerous approaches to this problem - i.e. quantitative invisibility. But such an approach, particularly on a mobile device with limited processing power, is difficult to implement efficiently, considering raycasting needs to be employed. Looking around some more I found this paper, which describes a couple of methods for using z-buffer sampling to achieve such an effect. However, I'm not an expert in this area, and while I understand the theory behind the techniques to an extent, I'm not sure how to go about the practical implementation. I was wondering if someone could guide me here at a more technical level - on the OpenGLES side of things. I'm also open to any suggestions regarding 3D line visibility in general.
The technique with z-buffer will be too complex for iOS devices - it needs heavy pixel shader and (IMHO) it will bring some visual artifacts.
If your models are not complex you can find geometric silhouette in runtime - for example by comparing normals of polygons with common edge: if z value of direction in view space has different sings (one normal is directed to camera and other is from camera) then this edge should be used for silhouette.
Another approach is more "FPS friendly": keep extruded version of your model. And render firstly extruded model with color of silhouette (without textures and lighting) and normal model over it. You will need more memory for vertices, but no real-time computations.
PS: In all games I have look at silhouettes were geometric.
I have worked out a solution that works nicely on an iPhone 4S (not tested on any other devices). It builds on the idea of rendering world-space quads, and does the silhouette detection all on the GPU. It works along these lines (pun not intended):
We generate edge information. This consists of a list of edges/"lines" in the mesh, and for each we associate two normals which represent the tris on either side of the edge.
This is processed into a set of quads that are uploaded to the GPU - each quad represents an edge. Each vertex of each quad is accompanied by three attributes (vec3s), namely the edge direction vector and the two neighbor tri normals. All quads are passed w/o "thickness" - i.e. the vertices on either end are in the same position. However, the edge direction vector is opposite for each vertex in the same position. This means they will extrude in opposite directions to form a quad when required.
We determine whether a vertex is part of a visible edge in the vertex shader by performing two dot products between each tri norm and the view vector and checking if they have opposite signs. (see standard silhouette algorithms around the net for details)
For vertices that are part of visible edges, we take the cross product of the edge direction vector with the view vector to get a screen-oriented "extrusion" vector. We add this vector to the vertex, but divided by the w value of the projected vertex in order to create a constant thickness quad.
This does not directly resolve the gaps that can appear between neighbor edges but is far more flexible when it comes to combating this. One solution may involve bridging the vertices between large angled lines with another quad, which I am exploring at the moment.

iPhone iOS is it possible to create a rangefinder with 2 laser pointers and an iPhone?

I'm working on an IPhone robot that would be moving around. One of the challenges is estimating distance to objects- I don't want the robot to run into things. I saw some very expensive (~1000$) laser rangefinders, and would like to emulate one using iPhone.
I got one or two camera feeds and two laser pointers. The laser pointers are mounted about 6 inches apart, at an angle The angle of lasers in relation to the cameras is known. The Angle of cameras to each other is known.
The lasers are pointing ahead of cameras, creating 2 dots on a camera feed. Is it possible to estimate the distance to the dots by looking at the distance between the dots in a camera image?
The lasers form a trapezoid from the
/wall \
/ \
/laser mount \
As the laser mount gets closer to the wall, the points should be moving further away from each other.
Is what I'm talking about feasible? Has anyone done something like that?
Would I need one or two cameras for such calculation?
If you just don't want to run into things, rather than have an accurate idea of the distance to them, then you could go "dambusters" on it and just detect when the two points become one - this would be at a known distance from the object.
For calculation, it is probaby cheaper to have four lasers instead, in two pairs, each pair at a different angle, one pair above the other. Then a comparison between the relative differences of the dots would probably let you work out a reasonably accurate distance. Math overflow for that one, though.
In theory, yes, something like this can work. Google "light striping" or "structured light depth measurement" for some good discussions of using this sort of idea on a larger scale.
In practice, your measurements are likely to be crude. There are a number of factors to consider: the camera intrinsic parameters (focal length, etc) and extrinsic parameters will affect how the dots appear in the image frame.
With only two sample points (note that structured light methods use lines, etc), the environment will present difficulties for distance measurement. Surfaces that are directly perpendicular to the floor (and direction of travel) can be handled reasonably well. Slopes and off-angle walls may be detectable, but you will find many situations that will give ambiguous or incorrect distance measures.

Calculating corresponding pixels

I have a computer vision set up with two cameras. One of this cameras is a time of flight camera. It gives me the depth of the scene at every pixel. The other camera is standard camera giving me a colour image of the scene.
We would like to use the depth information to remove some areas from the colour image. We plan on object, person and hand tracking in the colour image and want to remove far away background pixel with the help of the time of flight camera. It is not sure yet if the cameras can be aligned in a parallel set up.
We could use OpenCv or Matlab for the calculations.
I read a lot about rectification, Epipolargeometry etc but I still have problems to see the steps I have to take to calculate the correspondence for every pixel.
What approach would you use, which functions can be used. In which steps would you divide the problem? Is there a tutorial or sample code available somewhere?
Update We plan on doing an automatic calibration using known markers placed in the scene
If you want robust correspondences, you should consider SIFT. There are several implementations in MATLAB - I use the Vedaldi-Fulkerson VL Feat library.
If you really need fast performance (and I think you don't), you should think about using OpenCV's SURF detector.
If you have any other questions, do ask. This other answer of mine might be useful.
PS: By correspondences, I'm assuming you want to find the coordinates of a projection of the same 3D point on both your images - i.e. the coordinates (i,j) of a pixel u_A in Image A and u_B in Image B which is a projection of the same point in 3D.

texture minification filter in raytracing?

can someone point me to a paper/algorithm/resource/whatever that tells me how to implement a texture minification filter (applies when texels are smaller than pixels) in a raytracer?
thanks!
Since you are using ray tracing I suspect you are looking for a high quality filtering that changes sampling dynamically based on the amount of "error". Based on this assumption I would say take a look at "ray differentials". There's a nice paper on this here: http://graphics.stanford.edu/papers/trd/ and it takes effects like refraction and reflection into account.
Your answer to yourself sounds like the right approach, but since others may stumble across the page I'll add a resource link as requested. In addition to discussing mipmapping (ripmapping is basically more advanced mipmapping), they discuss the effects of reflection and refraction on derivatives and mip-level selection.
Homan Igehy. "Tracing Ray Differentials." 1999. Proceedings of SIGGRAPH. http://graphics.stanford.edu/papers/trd/
Upon closer reading I see that Rehno Lindeque mentioned this paper. At first didn't realize that it was the right reference because he says that the method samples dynamically based on the error of the sampling, which is incorrect. Filtering is done based on the size of the pixel's footprint and uses only one ray, just as you described.
Edit:
Another reference that might be useful ( http://www.cs.unc.edu/~awilson/class/238/#challenges ). Scroll to the section "Derivatives of Texture Coordinates." He suggests backward mapping of texture derivatives from the surface to the screen. I think this would be incorrect for reflected and refracted rays, but is possibly easier to implement and should be okay for primary rays.
I think you mean mipmap'ing.
Here is an article talking about using them.
But nether say how to chose which mipmap to use, but they are often blended (the bigger and smaller mipmap).
Here's a one more article about how Google Earth works, and it talks about how they mipmapping the earth.
thank you guys for your answers, but since I didn't find any appropriate techinque i created something myself which turned out to work very well:
i assume my ray to be a cone with a coneradius of half a pixel on the imageplane. when the ray hits a surface, i calculate the ellipse which is projected onto the surface (the ellipse from the plane-cone intersection). Then, using the texturecoordinate derivatives at the intersection point, i project this ellipse into texturespace. now i know which part of the texture lies under my pixel and can subsample this area
I Also use RipMaps to improve the quality - and i chose the RipMap level based on the size of the ellipse in Texturespace