I am currently writing an application where I am doing image processing (w/Core Image) on a 2D image that includes the face (and the saved instance of ARSCNFaceGeometry). I am having trouble and determined I am calculating the x,y point value to use in core image for the points that would corespond with those in ARFaceGeometry.verticies.
I am capturing the 2D image by calling ARSCNView.snapshot() and storing and doing processing on it as a CIImage.
I am currently using that texture coordinates to try to calculate the x,y position on the CIImage, but I havent had a ton of experience in using Core Image and couldnt figure out if this is the atribute I should be using.
Here is what I currently have to calculate the coordinates of a point in CIImage x,y space. I'm trying to produce the CIVector of the point. What am I doing wrong?
let imgAsCIImage = /* The CIImage of the ARSCNView Snapshot */
let faceDotPos = /* The index I am calculating point for */
let pointTexCoord = faceGeometry.textureCoordinates[faceDotPos]
let imageFrame = imgAsCIImage.extent
let xPoint = (CGFloat(pointTexCoord.x) * imageFrame.width)
let yPoint = (CGFloat(pointTexCoord.y) * imageFrame.height)
return CIVector(x: xPoint,y: yPoint)
Related
I'm implementing a gaussian subtract function that extracts features of 2d gaussian like objects from an input image. The algorithm is as follows:
inputImageX -> contrast image and threshold to 255 -> stack of sigma(n) blurred B intermittent 2D arrays -> stack of input- B(n) intermittent 2d arrays as C -> max value + index of C(n) 2D arrays as D -> draw circle with sigma(n) for all in B -> repeat cycle from C until maxvalue reaches 0.
I found some MTLFunction objects for 2D gaussian blur, and can create my own shaders for the subtract, max value and create circle shaders, but I am unsure how the MTLTexture2D objects can be cycle across multiple passes of the algorithm without writing redundant looking code in my filter class.
Can anyone point me to a link where I can figure if:
1- i can use a custom struct like a 2Dmatrix x n dimensional object to pass and apply the gaussian filter per dim 3 layer
2- How to create this cycle on the MTLPipelineState object so that each buffer between C and D uses the previously generated image
Here is the answer. I was trying to reinvent the wheel but found that there is a nifty metal performance shader called MPSImageKeyPoints which does all of the above nicely. The code is below, it works, just make sure you instantiate your own MTLDevice, MTLCommandQueue and MPSImageKeyPoint, as well as MTLTextures
// Start with converting the image
let inputTexture = getMTLTexture(from: getCGImage(from: image)!)
// Create a texture descriptor to get the buffer for transforming into a format compatible with MPSImageKeyPoints
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .r8Unorm, width: self.width, height: self.height, mipmapped: false)
textureDescriptor.usage = [.shaderRead, .shaderWrite]
let keyPoints = self.device.makeTexture(descriptor: textureDescriptor)
let imageConversionBuffer = self.commandQueue!.makeCommandBuffer()!
self.imageConversion!.encode(commandBuffer: imageConversionBuffer, sourceTexture: inputTexture, destinationTexture: keyPoints!)
imageConversionBuffer.commit()
imageConversionBuffer.waitUntilCompleted()
// Use the find key points with w*h star and 0.8 min value threshold
let maxpoints = self.width*self.height
let keyPointCountBuffer = self.device.makeBuffer(length: MemoryLayout<Int>.stride, options: .cpuCacheModeWriteCombined)
let keyPointDataBuffer = self.device.makeBuffer(length: MemoryLayout<MPSImageKeypointData>.stride*maxpoints, options: .cpuCacheModeWriteCombined)
let keyPointBuffer = self.commandQueue!.makeCommandBuffer()
self.findKeyPoints!.encode(to: keyPointBuffer!, sourceTexture: keyPoints!, regions: &self.filterRegion, numberOfRegions: 1, keypointCount: keyPointCountBuffer!, keypointCountBufferOffset: 0, keypointDataBuffer: keyPointDataBuffer!, keypointDataBufferOffset: 0)
// Finally run the filter
keyPointBuffer!.commit()
keyPointBuffer!.waitUntilCompleted()
// Extract the blobs
let starCount = keyPointCountBuffer!.contents().bindMemory(to: Int.self, capacity: 1)
print("Found \(starCount.pointee) stars")
let coordinatePointer = keyPointDataBuffer!.contents().bindMemory(to: MPSImageKeypointData.self, capacity: starCount.pointee)
let coordinateBuffer = UnsafeBufferPointer(start: coordinatePointer, count: starCount.pointee)
let coordinates = Array(coordinateBuffer)
var results = [[Int]]()
for i in 0..<starCount.pointee {
let coordinate = coordinates[i].keypointCoordinate
results.append([Int(coordinate[0]), Int(coordinate[1])])
}
I have the need to export georeferenced images from Leaflet.js on the client side. Exporting an image from Leaflet is not a problem as there are plenty of existing plugins for this, but I'd like to include a world file with the export so the resulting image can be read into GIS software. I have a working script fort his, but I can't seem to nail down the correct parameters for my world file such that the resulting georeferenced image is located exactly correctly.
Here's my current script
// map is a Leaflet map object
let bounds = map.getBounds(); // Leaflet LatLngBounds
let topLeft = bounds.getNorthWest();
let bottomRight = bounds.getSouthEast();
let width_deg = bottomRight.lng - topLeft.lng;
let height_deg = topLeft.lat - bottomRight.lat;
let width_px = $(map._container).width() // Width of the map in px
let height_px = $(map._container).height() // Height of the map in px
let scaleX = width_deg / width_px;
let scaleY = height_deg / height_px;
let jgwText = `${scaleX}
0
0
-${scaleY}
${topLeft.lng}
${topLeft.lat}`
This seems to work well at large scales (ie zoomed in to city-level or so), but at smaller scales there is some distortion along the y-axis. One thing I noticed is that all examples of world files I can find (and those produced from QGIS or ArcMap) all have the x-scale and y-scale parameters being exactly equal (oppositely signed). In my calculations, these terms are different unless you are sitting right on the equator.
Example world file produced from QGIS
0.08984380916303301 // x-scale (size of px in x direction)
0 // rotation parameter 1
0 // rotation parameter 2
-0.08984380916303301 // y-scale (size of px in y direction)
-130.8723208723141056 // x-coord of top left px
51.73651369984968085 // y-coord of top left px
Example world file produced from my calcs
0.021972656250000017
0
0
-0.015362443783773333
-130.91308593750003
51.781435604431195
Example of produced image using my calcs with correct state boundaries overlaid:
Does anyone have any idea what I'm doing wrong here?
Problem was solved by using EPSG:3857 for the worldfile, and ensuring the width and height of the map bounds was also measured in this coordinate system. I had tried using EPSG:3857 for the worldfile, but measured the width and height of the map bounds using Leaflet's L.map.distance() function. To solve the problem, I instead projected corner points of the map bounds to EPSG:3857 using L.CRS.EPSG3857.project(), the simply subtracted the X,Y values.
Corrected code is shown below, where map is a Leaflet map object (L.map)
// Get map bounds and corner points in 4326
let bounds = map.getBounds();
let topLeft = bounds.getNorthWest();
let bottomRight = bounds.getSouthEast();
let topRight = bounds.getNorthEast();
// get width and height in px of the map container
let width_px = $(map._container).width()
let height_px = $(map._container).height()
// project corner points to 3857
let topLeft_3857 = L.CRS.EPSG3857.project(topLeft)
let topRight_3857 = L.CRS.EPSG3857.project(topRight)
let bottomRight_3857 = L.CRS.EPSG3857.project(bottomRight)
// calculate width and height in meters using epsg:3857
let width_m = topRight_3857.x - topLeft_3857.x
let height_m = topRight_3857.y - bottomRight_3857.y
// calculate the scale in x and y directions in meters (this is the width and height of a single pixel in the output image)
let scaleX_m = width_m / width_px
let scaleY_m = height_m / height_px
// worldfiles need the CENTRE of the top left px, what we currently have is the TOPLEFT point of the px.
// Adjust by subtracting half a pixel width and height from the x,y
let topLeftCenterPxX = topLeft_3857.x - (scaleX / 2)
let topLeftCenterPxY = topLeft_3857.y - (scaleY / 2)
// format the text of the worldfile
let jgwText = `
${scaleX_m}
0
0
-${scaleY_m}
${topLeftCenterPxX}
${topLeftCenterPxY}
`
For anyone else with this problem, you'll know things are correct when your scale-x and scale-y values are exactly equal (but oppositely signed)!
Thanks #IvanSanchez for pointing me in the right direction :)
I've been trying to figure this out for a few days now.
Given an ARKit-based app where I track a user's face, how can I get the face's rotation in absolute terms, from its anchor?
I can get the transform of the ARAnchor, which is a simd_matrix4x4.
There's a lot of info on how to get the position out of that matrix (it's the 3rd column), but nothing on the rotation!
I want to be able to control a 3D object outside of the app, by passing YAW, PITCH and ROLL.
The latest I thing I tried actually works somewhat:
let arFrame = session.currentFrame!
guard let faceAnchor = arFrame.anchors[0] as? ARFaceAnchor else { return }
let faceMatrix = SCNMatrix4.init(faceAnchor.transform)
let node = SCNNode()
node.transform = faceMatrix
let rotation = node.worldOrientation
rotation.x .y and .z have values I could use, but as I move my phone the values change. For instance, if I turn 180˚ and keep looking at the phone, the values change wildly based on the position of the phone.
I tried changing the world alignment in the ARConfiguration, but that didn't make a difference.
Am I reading the wrong parameters? This should have been a lot easier!
I've figured it out...
Once you have the face anchor, some calculations need to happen with its transform matrix, and the camera's transform.
Like this:
let arFrame = session.currentFrame!
guard let faceAnchor = arFrame.anchors[0] as? ARFaceAnchor else { return }
let projectionMatrix = arFrame.camera.projectionMatrix(for: .portrait, viewportSize: self.sceneView.bounds.size, zNear: 0.001, zFar: 1000)
let viewMatrix = arFrame.camera.viewMatrix(for: .portrait)
let projectionViewMatrix = simd_mul(projectionMatrix, viewMatrix)
let modelMatrix = faceAnchor.transform
let mvpMatrix = simd_mul(projectionViewMatrix, modelMatrix)
// This allows me to just get a .x .y .z rotation from the matrix, without having to do crazy calculations
let newFaceMatrix = SCNMatrix4.init(mvpMatrix)
let faceNode = SCNNode()
faceNode.transform = newFaceMatrix
let rotation = vector_float3(faceNode.worldOrientation.x, faceNode.worldOrientation.y, faceNode.worldOrientation.z)
rotation.x .y and .z will return the face's pitch, yaw, roll (respectively)
I'm adding a small multiplier and inverting 2 of the axis, so it ends up like this:
yaw = -rotation.y*3
pitch = -rotation.x*3
roll = rotation.z*1.5
Phew!
I understand that you are using front camera and ARFaceTrackingConfiguration, which is not supposed to give you absolute values. I would try to configure second ARSession for back camera with ARWorldTrackingConfiguration which does provide absolute values. The final solution will probably require values from both ARSession's. I haven't tested this hypothesis yet but it seems to be the only way.
UPDATE quote from ARWorldTrackingConfiguration -
The ARWorldTrackingConfiguration class tracks the device's movement with six degrees of freedom (6DOF): specifically, the three rotation axes (roll, pitch, and yaw), and three translation axes (movement in x, y, and z). This kind of tracking can create immersive AR experiences: A virtual object can appear to stay in the same place relative to the real world, even as the user tilts the device to look above or below the object, or moves the device around to see the object's sides and back.
Apparently, other tracking configurations do not have this ability.
I am trying to merge two images using VNImageHomographicAlignmentObservation, I am currently getting a 3d matrix that looks like this:
simd_float3x3([ [0.99229, -0.00451023, -4.32607e-07)],
[0.00431724,0.993118, 2.38839e-07)],
[-72.2425, -67.9966, 0.999288)]], )
But I don't know how to use these values to merge into one image. There doesn't seem to be any documentation on what these values even mean. I found some information on transformation matrices here: Working with matrices.
But so far nothing else has helped me... Any suggestions?
My Code:
func setup() {
let floatingImage = UIImage(named:"DJI_0333")!
let referenceImage = UIImage(named: "DJI_0327")!
let request = VNHomographicImageRegistrationRequest(targetedCGImage: floatingImage.cgImage!, options: [:])
let handler = VNSequenceRequestHandler()
try! handler.perform([request], on: referenceImage.cgImage!)
if let results = request.results as? [VNImageHomographicAlignmentObservation] {
print("Perspective warp found: \(results.count)")
results.forEach { observation in
// A matrix with 3 rows and 3 columns.
let matrix = observation.warpTransform
print(matrix) }
}
}
This homography matrix H describes how to project one of your images onto the image plane of the other image. To transform each pixel to its projected location, you can to compute its projected location x' = H * x using homogeneous coordinates (basically take your 2D image coordinate, add a 1.0 as third component, apply the matrix H, and go back to 2D by dividing through the 3rd component of the result).
The most efficient way to do this for every pixel, is to write this matrix multiplication in homogeneous space using CoreImage. CoreImage offers multiple shader kernel types: CIColorKernel, CIWarpKernel and CIKernel. For this task, we only want to transform the location of each pixel, so a CIWarpKernel is what you need. Using the Core Image Shading Language, that would look as follows:
import CoreImage
let warpKernel = CIWarpKernel(source:
"""
kernel vec2 warp(mat3 homography)
{
vec3 homogen_in = vec3(destCoord().x, destCoord().y, 1.0); // create homogeneous coord
vec3 homogen_out = homography * homogen_in; // transform by homography
return homogen_out.xy / homogen_out.z; // back to normal 2D coordinate
}
"""
)
Note that the shader wants a mat3 called homography, which is the shading language equivalent of the simd_float3x3 matrix H. When calling the shader, the matrix is expected to be stored in a CIVector, to transform it use:
let (col0, col1, col2) = yourHomography.columns
let homographyCIVector = CIVector(values:[CGFloat(col0.x), CGFloat(col0.y), CGFloat(col0.z),
CGFloat(col1.x), CGFloat(col1.y), CGFloat(col1.z),
CGFloat(col2.x), CGFloat(col2.y), CGFloat(col2.z)], count: 9)
When you apply the CIWarpKernel to an image, you have to tell CoreImage how big the output should be. To merge the warped and reference image, the output should be big enough to cover the whole projected and original image. We can compute the size of the projected image by applying the homography to each corner of the image rect (this time in Swift, CoreImage calls this rect the extent):
/**
* Convert a 2D point to a homogeneous coordinate, transform by the provided homography,
* and convert back to a non-homogeneous 2D point.
*/
func transform(_ point:CGPoint, by homography:matrix_float3x3) -> CGPoint
{
let inputPoint = float3(Float(point.x), Float(point.y), 1.0)
var outputPoint = homography * inputPoint
outputPoint /= outputPoint.z
return CGPoint(x:CGFloat(outputPoint.x), y:CGFloat(outputPoint.y))
}
func computeExtentAfterTransforming(_ extent:CGRect, with homography:matrix_float3x3) -> CGRect
{
let points = [transform(extent.origin, by: homography),
transform(CGPoint(x: extent.origin.x + extent.width, y:extent.origin.y), by: homography),
transform(CGPoint(x: extent.origin.x + extent.width, y:extent.origin.y + extent.height), by: homography),
transform(CGPoint(x: extent.origin.x, y:extent.origin.y + extent.height), by: homography)]
var (xmin, xmax, ymin, ymax) = (points[0].x, points[0].x, points[0].y, points[0].y)
points.forEach { p in
xmin = min(xmin, p.x)
xmax = max(xmax, p.x)
ymin = min(ymin, p.y)
ymax = max(ymax, p.y)
}
let result = CGRect(x: xmin, y:ymin, width: xmax-xmin, height: ymax-ymin)
return result
}
let warpedExtent = computeExtentAfterTransforming(ciFloatingImage.extent, with: homography.inverse)
let outputExtent = warpedExtent.union(ciFloatingImage.extent)
Now you can create a warped version of your floating image:
let ciFloatingImage = CIImage(image: floatingImage)
let ciWarpedImage = warpKernel.apply(extent: outputExtent, roiCallback:
{
(index, rect) in
return computeExtentAfterTransforming(rect, with: homography.inverse)
},
image: inputImage,
arguments: [homographyCIVector])!
The roiCallback is there to tell CoreImage which part of the input image is needed to compute a certain part of the output. CoreImage uses this to apply the shader on parts of the image block by block, such that it can process huge images. (See Creating Custom Filters in Apple's docs). A quick hack would be to always return CGRect.infinite here, but then CoreImage can't do any block-wise magic.
And lastly, create a composite image of the reference image and the warped image:
let ciReferenceImage = CIImage(image: referenceImage)
let ciResultImage = ciWarpedImage.composited(over: ciReferenceImage)
let resultImage = UIImage(ciImage: ciResultImage)
I am very new to scenekit and 3d development in general and I'm playing around with ARKit and trying to fit a texture to a plane (well really a scnbox but only the top surface) but I'm seriously failing and also failing to find anything helpful on the web.
I have a texture of a road that is a very long rectangular png image. width:height ratio is about 20:1
I want to apply this texture to the surface of a table, once arkit has found the plane for me. I do not know the dimensions of the table before the app starts.
I can currently apply a texture to this plane, and also rotate the texture as desired.
What I would like to accomplish is to stretch the texture (keeping original ratio) so that the short sides of the plane and texture line up and then the texture continues until the end of the plane, cutting off or repeating depending on the length or ratio of the plane.
Here is the function that gets the ScnMaterial Object
class func getRunwayMaterial() -> SCNMaterial {
let name = "runway"
var mat = materials[name]
if let mat = mat {
return mat
}
mat = SCNMaterial()
mat!.lightingModel = SCNMaterial.LightingModel.physicallyBased
mat!.diffuse.contents = UIImage(named: "./Assets.scnassets/Materials/runway/runway.png")
mat!.diffuse.wrapS = SCNWrapMode.repeat
mat!.diffuse.wrapT = SCNWrapMode.repeat
materials[name] = mat
return mat!
}
This is the function that should be doing the scaling and rotating of the texture on the plane.
func setRunwayTextureScale(rotation: Float? = nil, material: SCNMaterial? = nil) {
let texture = material != nil ? material! : planeGeometry.materials[4]
var m: SCNMatrix4 = SCNMatrix4MakeScale(1, 1, 1)
if(rotation != nil){
textureRotation = rotation! + textureRotation
}
m = SCNMatrix4Rotate(m, textureRotation, 0, 1, 0)
texture.diffuse.contentsTransform = m
}
Please help me fill in the blanks here, and if anyone has any links or articles on how to do this kind of manipulation please link me!
Thanks!
Ethan
edit: btw I'm using xcode 9
Try using:
material.diffuse.wrapS = SCNWrapModeRepeat;
material.diffuse.wrapT = SCNWrapModeRepeat;
This would help the material not stretch, but simply keep adding more and more of the same png to itself.
You can also set the scale for the material by setting it to a width and height:
CGFloat width = self.planeGeometry.width;
CGFloat height = self.planeGeometry.length;
material.diffuse.contentsTransform = SCNMatrix4MakeScale(width, height, 1);
Sorry i'm working with Objective C here but should be pretty straightforward to translate this.
Also some good tutorials can be found on this link:
https://blog.markdaws.net/apple-arkit-by-example-ef1c8578fb59