ARKit – Are the values entered for Reference Image dimensions important? - swift

In ARKit, I am using the height and width of reference images (as entered by me into XCode in the AR Resource Group) to overlay planes of the same size onto matched images. Regardless of whether I enter accurate reference image dimensions, ARKit accurately overlays the plane onto the real world (i.e., the plane correctly covers the matched image in the ARSCNView).
If I understand correctly, estimatedScaleFactor tells me the difference between the true size of the reference image and the values I entered in the Resource Group.
My question is, if ARKit is able to figure the true size of the object shown in the reference image, when/why would I need to worry about entering accurate height and width values.
(My reference images are public art and accurately measuring them is sometimes difficult.)
Does ARKit have to work harder, or are there scenarios where I would stop getting good results without accurate Reference Image measurements?
ADDITIONAL INFO: As a concrete example, if I was matching movie posters, I would take a photo of the poster, load it into an AR Resource Group, and arbitrarily set the width to something like one meter (allowing Xcode to set the other dimension based on the proportions of the image).
Then, when ARKit matches the image, I would put a plane on it in renderer(_:didAdd:for:)
let plane = SCNPlane(width: referenceImage.physicalSize.width,
height: referenceImage.physicalSize.height)
plane.firstMaterial?.diffuse.contents = UIColor.planeColor
let planeNode = SCNNode(geometry: plane)
planeNode.eulerAngles.x = -.pi / 2
This appears to work as desired--the plane convincingly overlays the matched image--in spite of the fact that the dimensions I entered for the reference image are inaccurate. (And yes, estimatedScaleFactor does give a good approximation of by how much my arbitrary dimensions are off by.)
So, what I am trying to understand is whether this will break down in some scenarios (and when, and what I need to learn to understand why!). If my reference image dimensions are not accurate, will that negatively impact placing planes or other objects onto the node provided by ARKit?
Put another way, if ARKit is correctly understanding the world and reference images without accurate ref image measurements, does that mean I can get away with never entering accurate measurements for ref images?

As official documentation suggests:
The default value of estimatedScaleFactor (a factor between the initial size and the estimated physical size) is 1.0, which means that a version of this image that ARKit recognizes in the physical environment exactly matches its reference image physicalSize.
Otherwise, ARKit automatically corrects the image anchor's transform when estimatedScaleFactor is a value other than 1.0. This adjustment in turn, corrects ARKit's understanding of where the image anchor is located in the physical environment.
var estimatedScaleFactor: CGFloat { get }
For more precise scale of 3D model you need to measure your real-world image and when AR app will be running, ARKit measures its observable reference image. ARImageAnchor stores a value of estimatedScaleFactor property, thus ARKit registers a difference in scale factor, and then it applies the new scale to 3D model and you model becomes bigger or smaller, that estimatedScaleFactor is for.
However, there's an automatic methodology:
To accurately recognize the position and orientation of an image in the AR environment, ARKit must know the image's physical size. You provide this information when creating an AR reference image in your Xcode project's asset catalog, or when programmatically creating an ARReferenceImage.
When you want to recognize different-sized versions of a reference image, you set automaticImageScaleEstimationEnabled to true, and in this case, ARKit disregards physicalSize.
var automaticImageScaleEstimationEnabled: Bool { get set }


Unity, Relative dimensions of gameobjects

I saw some documents saying that there is no concepts of length in Unity. All you can do to determine the dimensions of the gameobjects is to use Scale.
Then how could I set the overall relative dimensions between the gameobjects?
For example, the dimension of a 1:1:1 plane is obviously different from a 1:1:1 sphere! Then how could I know what's the relative ratios between the plane and the sphere? 1 unit length of the plane is equal to how much unit of the diameter of the sphere!? Otherwise how could I know if I had set everything in the right proportion?
Well, what you say is right, but consider that objects could have a collider. And, in case of a sphere, you could obtain the radius with SphereCollider.radius.
Also, consider Bounds.extents, that's relative to the objects's bounding box.
Again, considering the Sphere, you can obtain the diameter with:
Mesh mesh = GetComponent<MeshFilter>().mesh;
Bounds bounds = mesh.bounds;
float diameter = bounds.extents.x * 2;
All GameObjects in unity have a Transform component, which determines its position, rotation and scale. Most 3D Objects also have a MeshFilter component, which contains reference to the Mesh object.
The Mesh contains the actual shape of the object, for example six faces of a cube or, faces of a sphere. Unity provides a handful of built in objects (cube, sphere, cyliner, plane, quad), but this is just a 'starter kit'. Most of those built in objects are 1 unit in size, but this is purely because the vertexes have been placed in those positions (so you need to scale by 2 to get 2units size).
But there is no limit on positinos within a mesh, you can have a tiny tiny object od a whole terrain object, and have them massively different in size despite keeping their scale at 1.
You should try to learn some 3D modelling application to create arbitrary objects.
Alternatively try and install a plugin called ProBuilder which used to be quite expensive and is nowe free (since acquired by Unity) which enabels in-editor modelling.
Scales are best kept at one, but its good to have an option to scale - this way you can re-use the spehre mesh, or the cube mesh, (less waste of memory) by having them at different scales.
In most unity applications you set the scale to some arbitrary number.
So typically 1 m = 1 unit.
All things that are 1 unit tall are 1 m tall.
If you import a mesh from a modelling program that is the wrong size, scale it to exactly one meter (use a standard 1,1,1 cube as reference). Then, stick it inside an empty game object to “convert” it into your game’s proper scale. So now if you scale the empty object’s y axis to 2, the object is now 2 meters tall.
A better solution is to keep all objects’ highest parent in the hierarchy at 1,1,1 scale. Using the 1,1,1 reference cube, scale your object to a size that looks proper. So for example if I had a model of a person I’d want it to be scaled to be roughly twice as tall as the cube. Then, drag it into an empty object of 1,1,1 scale this way, everything in your scene’s “normal” size is 1,1,1. If you want to double the size of something you’d then make it 2,2,2. In practice this is much more useful than the first option.
Now, if you change its position by 1 unit it is moving effectively by what would look like the proper 1 m also.
This process also lets you change where the “bottom” of an object is. You can change the position of the object inside the empty, making an “offset”. This is Useful for making models stand right on the ground with position y=0.

ARKit project point with previous device position

I'm combining ARKit with a CNN to constantly update ARKit nodes when they drift. So:
Get estimate of node position with ARKit and place a virtual object in the world
Use CNN to get its estimated 2D location of the object
Update node position accordingly (to refine it's location in 3D space)
The problem is that #2 takes 0,3s or so. Therefore I can't use sceneView.unprojectPoint because the point will correspond to a 3D point from the device's world position from #1.
How do I calculate the 3D vector from my old location to the CNN's 2D point?
unprojectPoint is just a matrix-math convenience function similar to those found in many graphics-oriented libraries (like DirectX, old-style OpenGL, Three.js, etc). In SceneKit, it's provided as a method on the view, which means it operates using the model/view/projection matrices and viewport the view currently uses for rendering. However, if you know how that function works, you can implement it yourself.
An Unproject function typically does two things:
Convert viewport coordinates (pixels) to the clip-space coordinate system (-1.0 to 1.0 in all directions).
Reverse the projection transform (assuming some arbitrary Z value in clip space) and the view (camera) transform to get to 3D world-space coordinates.
Given that knowledge, we can build our own function. (Warning: untested.)
func unproject(screenPoint: float3, // see below for Z depth hint discussion
modelView: float4x4,
projection: float4x4,
viewport: CGRect) -> float3 {
// viewport to clip: subtract viewport origin, divide by size,
// scale/offset from 0...1 to -1...1 coordinate space
let clip = (screenPoint - float3(viewport.x, viewport.y, 1.0))
/ float3(viewport.width, viewport.height, 1.0)
* float3(2) - float3(1)
// apply the reverse of the model-view-projection transform
let inversePM = (projection * modelView).inverse
let result = inversePM * float4(clip.x, clip.y, clip.z, 1.0)
return float3(result.x, result.y, result.z) / result.w // perspective divide
Now, to use it... The modelView matrix you pass to this function is the inverse of ARCamera.transform, and you can also get projectionMatrix directly from ARCamera. So, if you're grabbing a 2D position at one point in time, grab the camera matrices then, too, so that you can work backward to 3D as of that time.
There's still the issue of that "Z depth hint" I mentioned: when the renderer projects 3D to 2D it loses information (one of those D's, actually). So you have to recover or guess that information when you convert back to 3D — the screenPoint you pass in to the above function is the x and y pixel coordinates, plus a depth value between 0 and 1. Zero is closer to the camera, 1 is farther away. How you make use of that sort of depends on how the rest of your algorithm is designed. (At the very least, you can unproject both Z=0 and Z=1, and you'll get the endpoints of line segment in 3D, with your original point somewhere along that line.)
Of course, whether this can actually be put together with your novel CNN-based approach is another question entirely. But at least you learned some useful 3D graphics math!

Swift: How to set size of particle effects with SCNParticleSystem?

I have simple ARKit app (using SceneKit) with cubes floating in space that I am shooting with other objects. I created .scnp file with Fire as a template and customized it to sort of look like explosion.
Everything looks good and works on collision, but my whole particle effect takes whole screen. I tried every property available on the .scnp file but the size is still enormous.
How can I set the effect area size? For example to be slighty bigger than my cubes (with width 0.1 meters).
This is how run the explosion:
let fire = SCNParticleSystem(named: "explosion.scnp", inDirectory: nil)
contactNode is my target cube.
The particle system property you’re looking for is particleSize. (There’s a control for setting that property in the Xcode particle system GUI editor, but I forget what it’s labeled...)
The docs for that property say:
The rendered size, in units of the scene’s world coordinate space, of the particle image.
In ARKit, scene units are the same as real-world meters. So while a particle size of, say, 10x10 might make sense in some arbitrary scene, in AR that makes each particle the size of a house. You probably want values somewhere in the scale of millimeters to centimeters (0.001 - 0.01).

Optical lense distance from an object

I am using a Raspberry PI camera and the problem in hand is how to find the best position for it in order to fully see an object.
The object looks like this:
Question is how to find the perfect position given that the camera is placed in the centre of the above image. Perfectly the camera will be able to catch the object only, as the idea is to get the camera as close as possible.
Take a picture with you camera, save it as a JPG, then open it in a viewer that allows you to inspect the EXIF header. If you are lucky you should see the focal length (in mm) and the sensor size. If the latter is missing, you can probably work it out from the sensor's spec sheet (see here to start). From the two quantities you can work out the angle of the field of view (HorizFOV = atan(0.5 * sensor_width / focal_length), VertFOV = atan(0.5 * sensor_height / focal_length). From these angles you can derive an approximate distance from your subject that will keep it fully in view.
Note that these are only approximations. Nonlinear lens distortion will produce a slightly larger effective FOV, especially near the corners.

3d reconstruction from 2 views

I'm doing some study on the 3d reconstruction from two views and fixed known camera focal length. Something that is unclear to me is does triangulation gives us the real world scale of an object or the scale of the result is different to the actual one? If the scale is different than the actual size, how can I find the depth of points from it? I was wondering if there is more information that I need to create a real world scale of object.
Scale is arbitrary in SfM tasks so the result may be different in every reconstruction since points are initially projected on a random depth value.
You need at least one known distance in your scene to recover the absolute (real-world) scale. You can include one object with known size in your scene so you will be able to convert your scale afterwards.