Wrong image orientation after displayTransform call - swift

I am trying to get the image from the current ARFrame by using:
if let imageBuffer = sceneView.session.currentFrame?.capturedImage {
let orientation = UIApplication.shared.statusBarOrientation
let viewportSize = sceneView.bounds.size
let transformation = sceneView.session.currentFrame?.displayTransform(for: orientation, viewportSize: viewportSize)
let ciImage = CIImage(cvPixelBuffer: imageBuffer).transformed(by: transformation)
}
For landscape, it works great. For portrait, I get the image at a wrong angle (rotated by 180). Any idea why?
Output:
Expected:

At first I should say that it definitely is an unpleasant bug.
A problem is, when you convert an Portrait image, what ARFrame contains, to CIImage or CGImage, it loses its orientation and rotates it 180 degrees CCW. This issue affects only Portrait images. Landscape ones are not affected at all.
This happens because Portrait image doesn't have an info about its orientation at conversion stage, and thus, an image in portrait remains in portrait mode even though it's converted to CIImage or CGImage.
To fix this you should compare "standard" landscape's width/height with a "non-standard" portrait's width/height, and if these values are different, rotate an image to 180 degrees CW (or apply orientation case .portraitUpsideDown).
Hope this helps.

Coordinate systems
We need to be very clear about which coordinate system we are working in.
We know UIKit has (0,0) in the top left, and (1,1) in the top right, but this is not true of CoreImage:
Due to Core Image's coordinate system mismatch with UIKit... (see here)
or Vision (including CoreML recognition):
Vision uses a normalized coordinate space from 0.0 to 1.0 with lower
left origin. (see here)
However, displayTransform uses the UIKit orientation:
A transform matrix that converts from normalized image coordinates in
the captured image to normalized image coordinates that account for
the specified parameters. Normalized image coordinates range from
(0,0) in the upper left corner of the image to (1,1) in the lower
right corner. (See here)
So, if you load a CVPixelBuffer into a CIImage, and then try to apply
the displayTransform matrix, it's going to be flipped (as you can see).
But, also it messes up the image.
What display transform does
Display transform appears to be mainly for Metal, or other lower level drawing routines which tend to match the core image orientation.
The transformation scales the image and shifts it so it "aspect fills" within the specified bounds.
If you are going to display the image in a UIImageView then it will be reversed because their orientations differ. But furthermore, the image view does the aspect fill transformation for you,
so there is no reason to shift or scale, and thus no reason to use displayTransform at all. Just rotate the image to the proper orientation.
// apply an orientation. You can easily make a function
// which converts the screen orientation to this parameter
let rotated = CIImage(cvPixelBuffer: frame.capturedImage).oriented(...)
imageView.image = UIImage(ciImage: rotated)
If you want to overlay content on the image, such as by adding subviews to the UIImageView, then displayTransform can be helpful.
It will translate image coordinates (in UIKit orientation), into coordinates in the image view
which line up with the displayed image (which is shifted and scaled due to aspect fill).

Related

ARKit – Are the values entered for Reference Image dimensions important?

In ARKit, I am using the height and width of reference images (as entered by me into XCode in the AR Resource Group) to overlay planes of the same size onto matched images. Regardless of whether I enter accurate reference image dimensions, ARKit accurately overlays the plane onto the real world (i.e., the plane correctly covers the matched image in the ARSCNView).
If I understand correctly, estimatedScaleFactor tells me the difference between the true size of the reference image and the values I entered in the Resource Group.
My question is, if ARKit is able to figure the true size of the object shown in the reference image, when/why would I need to worry about entering accurate height and width values.
(My reference images are public art and accurately measuring them is sometimes difficult.)
Does ARKit have to work harder, or are there scenarios where I would stop getting good results without accurate Reference Image measurements?
ADDITIONAL INFO: As a concrete example, if I was matching movie posters, I would take a photo of the poster, load it into an AR Resource Group, and arbitrarily set the width to something like one meter (allowing Xcode to set the other dimension based on the proportions of the image).
Then, when ARKit matches the image, I would put a plane on it in renderer(_:didAdd:for:)
let plane = SCNPlane(width: referenceImage.physicalSize.width,
height: referenceImage.physicalSize.height)
plane.firstMaterial?.diffuse.contents = UIColor.planeColor
let planeNode = SCNNode(geometry: plane)
planeNode.eulerAngles.x = -.pi / 2
node.addChildNode(planeNode)
This appears to work as desired--the plane convincingly overlays the matched image--in spite of the fact that the dimensions I entered for the reference image are inaccurate. (And yes, estimatedScaleFactor does give a good approximation of by how much my arbitrary dimensions are off by.)
So, what I am trying to understand is whether this will break down in some scenarios (and when, and what I need to learn to understand why!). If my reference image dimensions are not accurate, will that negatively impact placing planes or other objects onto the node provided by ARKit?
Put another way, if ARKit is correctly understanding the world and reference images without accurate ref image measurements, does that mean I can get away with never entering accurate measurements for ref images?
As official documentation suggests:
The default value of estimatedScaleFactor (a factor between the initial size and the estimated physical size) is 1.0, which means that a version of this image that ARKit recognizes in the physical environment exactly matches its reference image physicalSize.
Otherwise, ARKit automatically corrects the image anchor's transform when estimatedScaleFactor is a value other than 1.0. This adjustment in turn, corrects ARKit's understanding of where the image anchor is located in the physical environment.
var estimatedScaleFactor: CGFloat { get }
For more precise scale of 3D model you need to measure your real-world image and when AR app will be running, ARKit measures its observable reference image. ARImageAnchor stores a value of estimatedScaleFactor property, thus ARKit registers a difference in scale factor, and then it applies the new scale to 3D model and you model becomes bigger or smaller, that estimatedScaleFactor is for.
However, there's an automatic methodology:
To accurately recognize the position and orientation of an image in the AR environment, ARKit must know the image's physical size. You provide this information when creating an AR reference image in your Xcode project's asset catalog, or when programmatically creating an ARReferenceImage.
When you want to recognize different-sized versions of a reference image, you set automaticImageScaleEstimationEnabled to true, and in this case, ARKit disregards physicalSize.
var automaticImageScaleEstimationEnabled: Bool { get set }

Setup Screen Boundary on iPad Pro 12.9 with SKPhysics

I'm trying to create a boundary of physics for the iPad Pro 12.9
This is how I'm doing it:
override func didMove(to view: SKView) {
physicsWorld.contactDelegate = self
let sceneBody = SKPhysicsBody(edgeLoopFrom: self.frame)
sceneBody.friction = 0
self.physicsBody = sceneBody
....
}
But the Y is way off in Landscape (much lower and higher than the actual screen), and a little ways off in Portrait. But the X is right in both.
I don't know what I'm doing wrong.
Update
I've added a print to the above, and its showing the maxX and maxY of self.frame to be 375 and 667 respectively. In landscape mode. Neither of those numbers are what they should be, as far as I can tell, yet the X value works correctly whilst Y is way off the top and bottom of the screen.
This iPad model's screen resolution is 2732x2048 (half that in points) so I don't see a correlation between these numbers and the reported frame size.
This has something to do with the way you're scaling the scene. When presenting a scene, you may be setting the scaleMode property of the scene, which is of type SKSceneScaleMode. There are four different modes:
fill: Each axis is scaled independently in order to fit the whole screen
aspectFill: The scene is scaled to fill the screen, but keeping the aspect ratio fixed. This is the one your scene is probably set to.
aspectFit: The scene is scaled to fit inside the screen, but keeps the aspect ratio. If the scene has a different aspect ratio from the device screen, there will be letter boxing.
resizeFill: The scene is resized to fit the view.

Relationship of video coordinates before/after resizing

I have a 720x576 video that was played full screen on a screen with 1280x960 resolution and the relevant eye tracker gaze coordinates data.
I have built a gaze tracking visualization code but the only thing I am not sure about is how to convert my input coordinates to match the original video.
So, does anybody have an idea on what to do?
The native aspect ratio of the video (720/576 = 1.25) does not match the aspect ratio at which it was displayed (1280/960 = 1.33). i.e. the pixels didn't just get scaled in size, but in shape.
So assuming your gaze coordinates were calibrated to match the physical screen (1280 × 960), then you will need to independently scale the x coordinates by 720/1280 = 0.5625 and the y coordinates by 576/960 = 0.6.
Note that this will distort the actual gaze behaviour (horizontal saccades are being scaled by more than vertical ones). Your safest option would actually be to rescale the video to have the same aspect ratio as the screen, and project the gaze coordinates onto that. That way, they won't be distorted, and the slightly skewed movie will match what was actually shown to the subjects.

How to rotate an image with content on the same spot?

I have an image like below
then I want to rotate it, but I don't want its position to be changed.
For example the output should look like below
If I do imrotate, it will change its position. Is there any other way to rotate this without changing its position?
The imrotate function rotates the entire image around the specified angle. What you want is to rotate only a part of the image. For that you'll have to specify which part you want to rotate. Formally speaking, this is the rectangle in which this symbol is located.
The coordinates of this rectangle can be found by selecting all rows and columns, where any pixel is black. This can be done by taking the sum over all rows, finding the first and last non-zero entries there, and doing the same over all columns.
sx=find(sum(im==0,1),1,'first');
ex=find(sum(im==0,1),1,'last');
sy=find(sum(im==0,2),1,'first');
ey=find(sum(im==0,2),1,'last');
The relevant part of the image is then
im(sy:ey,sx:ex)
Now you can rotate only this part of the image and save it to the same location within the whole image:
im(sy:ey,sx:ex) = imrotate(im(sy:ey,sx:ex),180);
with the desired result:
Note: this will only work for 180° angles, such as the example you provided. If you rotate by any other angle, e.g. 90° or even arbitrary angles, such as 23°, the output of imrotate will not have the same size as the input, so the assignment im(sy:ey,sx:ex) = ... will always throw an error.

Marking face and eyes on image

I am working on a app that detects face and marks eyes and mouth on image.I have detected the face ,eyes and mouth using CIDetector but the position of eyes and face that it returns is with respect to original image not according to the view of imageview on which i have to mark faces and eyes i.e for example i have a image of 720 *720 the position of face and eyes that it retuns is with respect to size of 720 *720.But the problem is ..i have to show eyes and face annotated on a image view of size 320 * 320. please advice me how can i map the postion of face returned by CIdetector to position of face on image view.
You can solve this by considering the imageview size to image size ratio.
Following is something really simple and could be used to solve your problem.
//'returnedPoint' is the position of eye returned by CIDetector
CGFloat ratio = 320/ 720.0;
//It is like, CGFloat ratio = yourImageView.frame.size.width/ yourImage.size.width;
CGPoint pointOnImageView = CGPointMake((ratio * returnedPoint.x), (ratio * returnedPoint.y));