I am trying to merge two images using VNImageHomographicAlignmentObservation, I am currently getting a 3d matrix that looks like this:
simd_float3x3([ [0.99229, -0.00451023, -4.32607e-07)],
[0.00431724,0.993118, 2.38839e-07)],
[-72.2425, -67.9966, 0.999288)]], )
But I don't know how to use these values to merge into one image. There doesn't seem to be any documentation on what these values even mean. I found some information on transformation matrices here: Working with matrices.
But so far nothing else has helped me... Any suggestions?
My Code:
func setup() {
let floatingImage = UIImage(named:"DJI_0333")!
let referenceImage = UIImage(named: "DJI_0327")!
let request = VNHomographicImageRegistrationRequest(targetedCGImage: floatingImage.cgImage!, options: [:])
let handler = VNSequenceRequestHandler()
try! handler.perform([request], on: referenceImage.cgImage!)
if let results = request.results as? [VNImageHomographicAlignmentObservation] {
print("Perspective warp found: \(results.count)")
results.forEach { observation in
// A matrix with 3 rows and 3 columns.
let matrix = observation.warpTransform
print(matrix) }
}
}
This homography matrix H describes how to project one of your images onto the image plane of the other image. To transform each pixel to its projected location, you can to compute its projected location x' = H * x using homogeneous coordinates (basically take your 2D image coordinate, add a 1.0 as third component, apply the matrix H, and go back to 2D by dividing through the 3rd component of the result).
The most efficient way to do this for every pixel, is to write this matrix multiplication in homogeneous space using CoreImage. CoreImage offers multiple shader kernel types: CIColorKernel, CIWarpKernel and CIKernel. For this task, we only want to transform the location of each pixel, so a CIWarpKernel is what you need. Using the Core Image Shading Language, that would look as follows:
import CoreImage
let warpKernel = CIWarpKernel(source:
"""
kernel vec2 warp(mat3 homography)
{
vec3 homogen_in = vec3(destCoord().x, destCoord().y, 1.0); // create homogeneous coord
vec3 homogen_out = homography * homogen_in; // transform by homography
return homogen_out.xy / homogen_out.z; // back to normal 2D coordinate
}
"""
)
Note that the shader wants a mat3 called homography, which is the shading language equivalent of the simd_float3x3 matrix H. When calling the shader, the matrix is expected to be stored in a CIVector, to transform it use:
let (col0, col1, col2) = yourHomography.columns
let homographyCIVector = CIVector(values:[CGFloat(col0.x), CGFloat(col0.y), CGFloat(col0.z),
CGFloat(col1.x), CGFloat(col1.y), CGFloat(col1.z),
CGFloat(col2.x), CGFloat(col2.y), CGFloat(col2.z)], count: 9)
When you apply the CIWarpKernel to an image, you have to tell CoreImage how big the output should be. To merge the warped and reference image, the output should be big enough to cover the whole projected and original image. We can compute the size of the projected image by applying the homography to each corner of the image rect (this time in Swift, CoreImage calls this rect the extent):
/**
* Convert a 2D point to a homogeneous coordinate, transform by the provided homography,
* and convert back to a non-homogeneous 2D point.
*/
func transform(_ point:CGPoint, by homography:matrix_float3x3) -> CGPoint
{
let inputPoint = float3(Float(point.x), Float(point.y), 1.0)
var outputPoint = homography * inputPoint
outputPoint /= outputPoint.z
return CGPoint(x:CGFloat(outputPoint.x), y:CGFloat(outputPoint.y))
}
func computeExtentAfterTransforming(_ extent:CGRect, with homography:matrix_float3x3) -> CGRect
{
let points = [transform(extent.origin, by: homography),
transform(CGPoint(x: extent.origin.x + extent.width, y:extent.origin.y), by: homography),
transform(CGPoint(x: extent.origin.x + extent.width, y:extent.origin.y + extent.height), by: homography),
transform(CGPoint(x: extent.origin.x, y:extent.origin.y + extent.height), by: homography)]
var (xmin, xmax, ymin, ymax) = (points[0].x, points[0].x, points[0].y, points[0].y)
points.forEach { p in
xmin = min(xmin, p.x)
xmax = max(xmax, p.x)
ymin = min(ymin, p.y)
ymax = max(ymax, p.y)
}
let result = CGRect(x: xmin, y:ymin, width: xmax-xmin, height: ymax-ymin)
return result
}
let warpedExtent = computeExtentAfterTransforming(ciFloatingImage.extent, with: homography.inverse)
let outputExtent = warpedExtent.union(ciFloatingImage.extent)
Now you can create a warped version of your floating image:
let ciFloatingImage = CIImage(image: floatingImage)
let ciWarpedImage = warpKernel.apply(extent: outputExtent, roiCallback:
{
(index, rect) in
return computeExtentAfterTransforming(rect, with: homography.inverse)
},
image: inputImage,
arguments: [homographyCIVector])!
The roiCallback is there to tell CoreImage which part of the input image is needed to compute a certain part of the output. CoreImage uses this to apply the shader on parts of the image block by block, such that it can process huge images. (See Creating Custom Filters in Apple's docs). A quick hack would be to always return CGRect.infinite here, but then CoreImage can't do any block-wise magic.
And lastly, create a composite image of the reference image and the warped image:
let ciReferenceImage = CIImage(image: referenceImage)
let ciResultImage = ciWarpedImage.composited(over: ciReferenceImage)
let resultImage = UIImage(ciImage: ciResultImage)
Related
I'm implementing a gaussian subtract function that extracts features of 2d gaussian like objects from an input image. The algorithm is as follows:
inputImageX -> contrast image and threshold to 255 -> stack of sigma(n) blurred B intermittent 2D arrays -> stack of input- B(n) intermittent 2d arrays as C -> max value + index of C(n) 2D arrays as D -> draw circle with sigma(n) for all in B -> repeat cycle from C until maxvalue reaches 0.
I found some MTLFunction objects for 2D gaussian blur, and can create my own shaders for the subtract, max value and create circle shaders, but I am unsure how the MTLTexture2D objects can be cycle across multiple passes of the algorithm without writing redundant looking code in my filter class.
Can anyone point me to a link where I can figure if:
1- i can use a custom struct like a 2Dmatrix x n dimensional object to pass and apply the gaussian filter per dim 3 layer
2- How to create this cycle on the MTLPipelineState object so that each buffer between C and D uses the previously generated image
Here is the answer. I was trying to reinvent the wheel but found that there is a nifty metal performance shader called MPSImageKeyPoints which does all of the above nicely. The code is below, it works, just make sure you instantiate your own MTLDevice, MTLCommandQueue and MPSImageKeyPoint, as well as MTLTextures
// Start with converting the image
let inputTexture = getMTLTexture(from: getCGImage(from: image)!)
// Create a texture descriptor to get the buffer for transforming into a format compatible with MPSImageKeyPoints
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .r8Unorm, width: self.width, height: self.height, mipmapped: false)
textureDescriptor.usage = [.shaderRead, .shaderWrite]
let keyPoints = self.device.makeTexture(descriptor: textureDescriptor)
let imageConversionBuffer = self.commandQueue!.makeCommandBuffer()!
self.imageConversion!.encode(commandBuffer: imageConversionBuffer, sourceTexture: inputTexture, destinationTexture: keyPoints!)
imageConversionBuffer.commit()
imageConversionBuffer.waitUntilCompleted()
// Use the find key points with w*h star and 0.8 min value threshold
let maxpoints = self.width*self.height
let keyPointCountBuffer = self.device.makeBuffer(length: MemoryLayout<Int>.stride, options: .cpuCacheModeWriteCombined)
let keyPointDataBuffer = self.device.makeBuffer(length: MemoryLayout<MPSImageKeypointData>.stride*maxpoints, options: .cpuCacheModeWriteCombined)
let keyPointBuffer = self.commandQueue!.makeCommandBuffer()
self.findKeyPoints!.encode(to: keyPointBuffer!, sourceTexture: keyPoints!, regions: &self.filterRegion, numberOfRegions: 1, keypointCount: keyPointCountBuffer!, keypointCountBufferOffset: 0, keypointDataBuffer: keyPointDataBuffer!, keypointDataBufferOffset: 0)
// Finally run the filter
keyPointBuffer!.commit()
keyPointBuffer!.waitUntilCompleted()
// Extract the blobs
let starCount = keyPointCountBuffer!.contents().bindMemory(to: Int.self, capacity: 1)
print("Found \(starCount.pointee) stars")
let coordinatePointer = keyPointDataBuffer!.contents().bindMemory(to: MPSImageKeypointData.self, capacity: starCount.pointee)
let coordinateBuffer = UnsafeBufferPointer(start: coordinatePointer, count: starCount.pointee)
let coordinates = Array(coordinateBuffer)
var results = [[Int]]()
for i in 0..<starCount.pointee {
let coordinate = coordinates[i].keypointCoordinate
results.append([Int(coordinate[0]), Int(coordinate[1])])
}
I'm trying to create a custom geometry object in SceneKit, which should be a plane with an arbitrary shape. I'm supplying the outlining vertices of the shape, and want to fill up the inside of it.
So far I have been using this code:
extension SCNGeometry {
static func polygonPlane(vertices: [SCNVector3]) -> SCNGeometry {
var indices: [Int32] = [Int32(vertices.count)]
var index: Int32 = 0
for _ in vertices {
indices.append(index)
index += 1
}
let vertexSource = SCNGeometrySource(vertices: vertices)
let textureCoords : [CGPoint] = [] // Fix to map textures to the polygon plane...
let textureCoordsSource = SCNGeometrySource(textureCoordinates: textureCoords)
let indexData = Data(bytes: indices, count: indices.count * MemoryLayout<Int32>.size)
let element = SCNGeometryElement(data: indexData, primitiveType: .polygon, primitiveCount: 1, bytesPerIndex: MemoryLayout<Int32>.size)
let geometry = SCNGeometry(sources: [vertexSource, textureCoordsSource], elements: [element])
let imageMaterial = SCNMaterial()
imageMaterial.diffuse.contents = UIImage(named: "texture.jpg")
let scaleX = (Float(1)).rounded()
let scaleY = (Float(1)).rounded()
imageMaterial.diffuse.contentsTransform = SCNMatrix4MakeScale(scaleX, scaleY, 0)
imageMaterial.isDoubleSided = true
geometry.firstMaterial = imageMaterial
return geometry
}
}
This works reasonably well when making more simple polygon shapes, but does not work as intended when the shape becomes more complex and narrow in different places. I also don't know of any way to create texture coordinates in order to apply a custom texture with this approach.
I think I need to utilize some kind of polygon triangulation algorithm in order to break the shape into triangles, and then use the correct SCNGeometryPrimitiveType such as .triangles or .triangleStrip. This could probably also allow me to do a UV-mapping for the texture coordinates, however I'm not sure how that would work as of right now.
The polygon triangulation algorithm would need to be able to handle 3D coordinates, as the created 2D geometry should exist in a 3D world (you should be able to create tilted polygon planes etc.). I have not been able to find any 3D polygon triangulation algorithms already implemented in Swift yet.
To be clear on the texture coordinates; the texture that would be used is a repeating texture such as this one:
For complex cases SCNShape is more suited as it uses a more elaborate triangulation (Delaunay).
A simple SCNGeometryElement of type SCNGeometryPrimitiveTypePolygon will generate a triangle fan.
I am currently facing the problem that I want to calculate the angle in radians from the camera's position to a target position. However, this calculation needs to take into account the heading of the camera as well.
For example, when the camera is facing away from the object the function should return π. So far the function I have written works most of the time. However when the user gets close to the X and Z axis the arrow does not point to the target any more, rather it points slightly to the left or right depending if you are at positive or negative X and z space.
Currently, I'm not sure why my function does not work. The only explanation I would have for this behavior is gimbal lock. However I'm not quite sure how to implement the same function using quaternions.
I also attached some photos to this post that the issue is a little bit more clear.
Here is the function I'm using right now:
func getAngle() -> Float {
guard let pointOfView = self.sceneView.session.currentFrame else { return 0.0 }
let cameraPosition = pointOfView.camera.transform.columns.3
let heading = getUserVector()
let distance = SCNVector3Make(TargetPosition.x - cameraPosition.x ,TargetPosition.y - cameraPosition.y - TargetPosition.y,TargetPosition.z - cameraPosition.z)
let heading_scalar = sqrtf(heading.x * heading.x + heading.z * heading.z)
let distance_scalar = sqrtf(distance.z * distance.z + distance.z * distance.z)
let x = ((heading.x * distance.x) + (heading.z * distance.z) / (heading_scalar * distance_scalar))
let theta = acos(max(min(x, 1), -1))
if theta < 0.35 {
return 0
}
if (heading.x * (distance.z / distance_scalar) - heading.z * (distance.x/distance_scalar)) > 0{
return theta
}
else{
return -theta
}
}
func getUserVector() -> (SCNVector3) { // (direction)
if let frame = self.sceneView.session.currentFrame {
let mat = SCNMatrix4(frame.camera.transform) // 4x4 transform matrix describing camera in world space
let dir = SCNVector3(-1 * mat.m31, -1 * mat.m32, -1 * mat.m33) // orientation of camera in world space
print(mat)
return dir
}
return SCNVector3(0, 0, -1)
}
Consider the following image as an example. The arrow in the top right corner should be pointing straight up to follow the line to the center object but instead it is pointing slightly to the left. As I am aligned with the z-axis the same behavior happens when aligning with the x-axis.
I figured out the answer to my problem the solution was transforming the object into the prospective of the camera and then simply taking the atan2 to get the angle in between the camera and object hope this post will help future readers!
func getAngle() -> Float {
guard let pointOfView = self.sceneView.session.currentFrame else { return 0.0 }
let cameraPosition = pointOfView.camera.transform
let targetPosition = simd_float4x4(targetNode.transform)
let newTransform = simd_mul(cameraPosition.inverse, targetPosition).columns.3
let theta = atan2(newTransform.z, newTransform.y)
return theta + (Float.pi / 2)
}
I am currently writing an application where I am doing image processing (w/Core Image) on a 2D image that includes the face (and the saved instance of ARSCNFaceGeometry). I am having trouble and determined I am calculating the x,y point value to use in core image for the points that would corespond with those in ARFaceGeometry.verticies.
I am capturing the 2D image by calling ARSCNView.snapshot() and storing and doing processing on it as a CIImage.
I am currently using that texture coordinates to try to calculate the x,y position on the CIImage, but I havent had a ton of experience in using Core Image and couldnt figure out if this is the atribute I should be using.
Here is what I currently have to calculate the coordinates of a point in CIImage x,y space. I'm trying to produce the CIVector of the point. What am I doing wrong?
let imgAsCIImage = /* The CIImage of the ARSCNView Snapshot */
let faceDotPos = /* The index I am calculating point for */
let pointTexCoord = faceGeometry.textureCoordinates[faceDotPos]
let imageFrame = imgAsCIImage.extent
let xPoint = (CGFloat(pointTexCoord.x) * imageFrame.width)
let yPoint = (CGFloat(pointTexCoord.y) * imageFrame.height)
return CIVector(x: xPoint,y: yPoint)
In SCNARView I can access a property of camera node called worldFront, which represents camera rotation. I would like to calculate similar vector from CoreMotion values not using SCNARView, just data from CoreMotion. So that I can get a vector that would be equal to worldFront in SCNARView if camera was facing the same direction. Can someone explain me how to calculate such a value?
The attitude property probably could help
func rollCam(motion: CMDeviceMotion) {
let attitude = motion.attitude
let roll = Float(attitude.roll-M_PI/2)
let yaw = Float(attitude.yaw)
let pitch = Float(attitude.pitch)
camNode.eulerAngles = SCNVector3Make(roll, -yaw, pitch)
}
With this piece of code, quite long time ago, I experimented a bit with CoreMotion. I was trying to first detect human walking and then (with the startDeviceMotionUpdates data) move and roll the camera near to an "anchored" SCNBox. Later on ARKit solved my need with the ARAnchor class
What feature are you looking after?
I have found the answer:
override var cameraFrontVector: double3 {
guard let quaternion = motionService.deviceMotion?.attitude.quaternion else { return .zero }
let x = 2 * -(quaternion.x * quaternion.z + quaternion.w * quaternion.y)
let z = 2 * (quaternion.y * quaternion.z - quaternion.w * quaternion.x)
let y = 2 * (quaternion.x * quaternion.x + quaternion.y * quaternion.y) - 1
return double3(x: x, y: y, z: z)
}
This gives me values like worldFront in SCNNode.