I'm implementing a gaussian subtract function that extracts features of 2d gaussian like objects from an input image. The algorithm is as follows:
inputImageX -> contrast image and threshold to 255 -> stack of sigma(n) blurred B intermittent 2D arrays -> stack of input- B(n) intermittent 2d arrays as C -> max value + index of C(n) 2D arrays as D -> draw circle with sigma(n) for all in B -> repeat cycle from C until maxvalue reaches 0.
I found some MTLFunction objects for 2D gaussian blur, and can create my own shaders for the subtract, max value and create circle shaders, but I am unsure how the MTLTexture2D objects can be cycle across multiple passes of the algorithm without writing redundant looking code in my filter class.
Can anyone point me to a link where I can figure if:
1- i can use a custom struct like a 2Dmatrix x n dimensional object to pass and apply the gaussian filter per dim 3 layer
2- How to create this cycle on the MTLPipelineState object so that each buffer between C and D uses the previously generated image
Here is the answer. I was trying to reinvent the wheel but found that there is a nifty metal performance shader called MPSImageKeyPoints which does all of the above nicely. The code is below, it works, just make sure you instantiate your own MTLDevice, MTLCommandQueue and MPSImageKeyPoint, as well as MTLTextures
// Start with converting the image
let inputTexture = getMTLTexture(from: getCGImage(from: image)!)
// Create a texture descriptor to get the buffer for transforming into a format compatible with MPSImageKeyPoints
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .r8Unorm, width: self.width, height: self.height, mipmapped: false)
textureDescriptor.usage = [.shaderRead, .shaderWrite]
let keyPoints = self.device.makeTexture(descriptor: textureDescriptor)
let imageConversionBuffer = self.commandQueue!.makeCommandBuffer()!
self.imageConversion!.encode(commandBuffer: imageConversionBuffer, sourceTexture: inputTexture, destinationTexture: keyPoints!)
imageConversionBuffer.commit()
imageConversionBuffer.waitUntilCompleted()
// Use the find key points with w*h star and 0.8 min value threshold
let maxpoints = self.width*self.height
let keyPointCountBuffer = self.device.makeBuffer(length: MemoryLayout<Int>.stride, options: .cpuCacheModeWriteCombined)
let keyPointDataBuffer = self.device.makeBuffer(length: MemoryLayout<MPSImageKeypointData>.stride*maxpoints, options: .cpuCacheModeWriteCombined)
let keyPointBuffer = self.commandQueue!.makeCommandBuffer()
self.findKeyPoints!.encode(to: keyPointBuffer!, sourceTexture: keyPoints!, regions: &self.filterRegion, numberOfRegions: 1, keypointCount: keyPointCountBuffer!, keypointCountBufferOffset: 0, keypointDataBuffer: keyPointDataBuffer!, keypointDataBufferOffset: 0)
// Finally run the filter
keyPointBuffer!.commit()
keyPointBuffer!.waitUntilCompleted()
// Extract the blobs
let starCount = keyPointCountBuffer!.contents().bindMemory(to: Int.self, capacity: 1)
print("Found \(starCount.pointee) stars")
let coordinatePointer = keyPointDataBuffer!.contents().bindMemory(to: MPSImageKeypointData.self, capacity: starCount.pointee)
let coordinateBuffer = UnsafeBufferPointer(start: coordinatePointer, count: starCount.pointee)
let coordinates = Array(coordinateBuffer)
var results = [[Int]]()
for i in 0..<starCount.pointee {
let coordinate = coordinates[i].keypointCoordinate
results.append([Int(coordinate[0]), Int(coordinate[1])])
}
Related
I've been doing some work with Core Image's convolution filters and I've noticed that sufficiently long chains of convolutions lead to unexpected outputs that I suspect are the result of numerical overflow on the underlying integer, float, or half float type being used to hold the pixel data. This is especially unexpected because the documentation says that every convolution's output value is "clamped to the range between 0.0 and 1.0", so ever larger values should not accumulate over successive passes of the filter but that's exactly what seems to be happening.
I've got some sample code here that demonstrates this surprise behavior. You should be able to paste it as is into just about any Xcode project, set a breakpoint at the end of it, run it on the appropriate platform (I'm using an iPhone Xs, not a simulator), and then when the break occurs use Quick Looks to inspect the filter chain.
import CoreImage
import CoreImage.CIFilterBuiltins
// --------------------
// CREATE A WHITE IMAGE
// --------------------
// the desired size of the image
let size = CGSize(width: 300, height: 300)
// create a pixel buffer to use as input; every pixel is bgra(0,0,0,0) by default
var pixelBufferOut: CVPixelBuffer?
CVPixelBufferCreate(kCFAllocatorDefault, Int(size.width), Int(size.height), kCVPixelFormatType_32BGRA, nil, &pixelBufferOut)
let input = pixelBufferOut!
// create an image from the input
let image = CIImage(cvImageBuffer: input)
// create a color matrix filter that will turn every pixel white
// bgra(0,0,0,0) becomes bgra(1,1,1,1)
let matrixFilter = CIFilter.colorMatrix()
matrixFilter.biasVector = CIVector(string: "1 1 1 1")
// turn the image white
matrixFilter.inputImage = image
let whiteImage = matrixFilter.outputImage!
// the matrix filter sets the image's extent to infinity
// crop it back to original size so Quick Looks can display the image
let cropped = whiteImage.cropped(to: CGRect(origin: .zero, size: size))
// ------------------------------
// CONVOLVE THE IMAGE SEVEN TIMES
// ------------------------------
// create a 3x3 convolution filter with every weight set to 1
let convolutionFilter = CIFilter.convolution3X3()
convolutionFilter.weights = CIVector(string: "1 1 1 1 1 1 1 1 1")
// 1
convolutionFilter.inputImage = cropped
let convolved = convolutionFilter.outputImage!
// 2
convolutionFilter.inputImage = convolved
let convolvedx2 = convolutionFilter.outputImage!
// 3
convolutionFilter.inputImage = convolvedx2
let convolvedx3 = convolutionFilter.outputImage!
// 4
convolutionFilter.inputImage = convolvedx3
let convolvedx4 = convolutionFilter.outputImage!
// 5
convolutionFilter.inputImage = convolvedx4
let convolvedx5 = convolutionFilter.outputImage!
// 6
convolutionFilter.inputImage = convolvedx5
let convolvedx6 = convolutionFilter.outputImage!
// 7
convolutionFilter.inputImage = convolvedx6
let convolvedx7 = convolutionFilter.outputImage!
// <-- put a breakpoint here
// when you run the code you can hover over the variables
// to see what the image looks like at various stages through
// the filter chain; you will find that the image is still white
// up until the seventh convolution, at which point it turns black
Further evidence that this is an overflow issue is that if I use a CIContext to render the image to an output pixel buffer, I have the opportunity to set the actual numerical type used during the render via the CIContextOption.workingFormat option. On my platform the default value is CIFormat.RGBAh which means each color channel uses a 16 bit float. If instead I use CIFormat.RGBAf which uses full 32 bit floats this problem goes away because it takes a lot more to overflow 32 bits than it does 16.
Is my insight into what's going on here correct or am I totally off? Is the documentation about clamping wrong or is this a bug with the filters?
It seems the documentation is outdated. Maybe it comes from a time where Core Image used 8-bit unsigned byte texture formates by default on iOS because those are clamped between 0.0 and 1.0.
With the float-typed formates, the values aren't clamped anymore and are stored as returned by the kernel. And since you started with white (1.0) and applied 7 consecutive convolutions with unnormalized weights (1 instead of 1/9), you end up with values of 9^7 = 4,782,969 per channel, which is outside of 16-bit float range.
To avoid something like that, you should normalize your convolution weights so that they sum up to 1.0.
By the way: to create a white image of a certain size, simply do this:
let image = CIImage(color: .white).cropped(to: CGSize(width: 300, height: 300))
🙂
I'm trying to get the bone rotations related to their parents, but I end up getting pretty weird angles.
I've tried everything, matrix multiplications, offsets, axis swapping, and no luck.
guard let bodyAnchor = anchor as? ARBodyAnchor else { continue }
let skeleton = bodyAnchor.skeleton
let jointTransforms = skeleton.jointLocalTransforms
for (i, jointTransform) in jointTransforms.enumerated() {
//RETRIEVE ANGLES HERE
}
In //RETRIEVE ANGLES HERE I've tried different approaches:
let n = SCNNode()
n.transform = SCNMatrix4(jointTransform)
print(n.eulerAngles)
In this try, I set the jointTransformation to a SCNNode.transform so I can retrieve the eulerAngles to make them human readable and try to understand what's happening.
I get to work some joints, but I think it's pure coincidence or luck, because the rest of the bones rotate very weird.
In other try I get them using jointModelTransforms (Model, instead of Local) so all transforms are relative to the Root bone of the Skeleton.
With this approach I do matrix multiplications like this:
LocalMatrix = Inverse(JointModelMatrix) * (ParentJointModelMatrix)
To get the rotations relative to its parent, but same situation, some bones rotate okay other rotate weird. Pure coincidence I bet.
Why do I want to get the bone rotations?
I'm trying build a MoCap app with my phone that passes to Blender the rotations, trying to build .BVH files from this, so I can use them on Blender.
This is my own rig:
I've done this before with Kinect, but I've been trying for days to do it on ARKit 3 with no luck :(
Using simd_quatf(from:to:) with the right input should do it. I had trouble with weird angles until i started normalising the vectors:
guard let bodyAnchor = anchor as? ARBodyAnchor else { continue }
let skeleton = bodyAnchor.skeleton
let jointTransforms = skeleton.jointLocalTransforms
for (i, jointTransform) in jointTransforms.enumerated() {
// First i filter out the root (Hip) joint because it doesn't have a parent
let parentIndex = skeleton.definition.parentIndices[i]
guard parentIndex >= 0 else { continue } // root joint has parent index of -1
//RETRIEVE ANGLES HERE
let jointVectorFromParent = simd_make_float3(jointTransform.columns.3)
let referenceVector: SIMD3<Float>
if skeleton.definition.parentIndices[parentIndex] >= 0 {
referenceVector = simd_make_float3(jointTransforms[parentIndex].columns.3)
} else {
// The parent joint is the Hip joint which should have
// a vector of 0 going to itself
// It's impossible to calculate an angle from a vector of length 0,
// So we're using a vector that's just pointing up
referenceVector = SIMD3<Float>(x: 0, y: 1, z: 0)
}
// Normalizing is important because simd_quatf gives weird results otherwise
let jointNormalized = normalize(jointVectorFromParent)
let referenceNormalized = normalize(referenceVector)
let orientation = simd_quatf(from: referenceNormalized, to: jointNormalized)
print("angle of joint \(i) = \(orientation.angle)")
}
One important thing to keep in mind though:
ARKit3 tracks only some joints (AFAIK the named joints in ARSkeleton.JointName). The other joints are extrapolated from that using a standardized skeleton. Which means, that the angle you get for the elbow for example won't be the exact angle the tracked persons elbow has there.
Just a guess… does this do the job?
let skeleton = bodyAnchor.skeleton
let jointTransforms = skeleton.jointLocalTransforms
for (i, jointTransform) in jointTransforms.enumerated() {
print(Transform(matrix: jointTransform).rotation)
}
I'm trying to create a custom geometry object in SceneKit, which should be a plane with an arbitrary shape. I'm supplying the outlining vertices of the shape, and want to fill up the inside of it.
So far I have been using this code:
extension SCNGeometry {
static func polygonPlane(vertices: [SCNVector3]) -> SCNGeometry {
var indices: [Int32] = [Int32(vertices.count)]
var index: Int32 = 0
for _ in vertices {
indices.append(index)
index += 1
}
let vertexSource = SCNGeometrySource(vertices: vertices)
let textureCoords : [CGPoint] = [] // Fix to map textures to the polygon plane...
let textureCoordsSource = SCNGeometrySource(textureCoordinates: textureCoords)
let indexData = Data(bytes: indices, count: indices.count * MemoryLayout<Int32>.size)
let element = SCNGeometryElement(data: indexData, primitiveType: .polygon, primitiveCount: 1, bytesPerIndex: MemoryLayout<Int32>.size)
let geometry = SCNGeometry(sources: [vertexSource, textureCoordsSource], elements: [element])
let imageMaterial = SCNMaterial()
imageMaterial.diffuse.contents = UIImage(named: "texture.jpg")
let scaleX = (Float(1)).rounded()
let scaleY = (Float(1)).rounded()
imageMaterial.diffuse.contentsTransform = SCNMatrix4MakeScale(scaleX, scaleY, 0)
imageMaterial.isDoubleSided = true
geometry.firstMaterial = imageMaterial
return geometry
}
}
This works reasonably well when making more simple polygon shapes, but does not work as intended when the shape becomes more complex and narrow in different places. I also don't know of any way to create texture coordinates in order to apply a custom texture with this approach.
I think I need to utilize some kind of polygon triangulation algorithm in order to break the shape into triangles, and then use the correct SCNGeometryPrimitiveType such as .triangles or .triangleStrip. This could probably also allow me to do a UV-mapping for the texture coordinates, however I'm not sure how that would work as of right now.
The polygon triangulation algorithm would need to be able to handle 3D coordinates, as the created 2D geometry should exist in a 3D world (you should be able to create tilted polygon planes etc.). I have not been able to find any 3D polygon triangulation algorithms already implemented in Swift yet.
To be clear on the texture coordinates; the texture that would be used is a repeating texture such as this one:
For complex cases SCNShape is more suited as it uses a more elaborate triangulation (Delaunay).
A simple SCNGeometryElement of type SCNGeometryPrimitiveTypePolygon will generate a triangle fan.
I am trying to merge two images using VNImageHomographicAlignmentObservation, I am currently getting a 3d matrix that looks like this:
simd_float3x3([ [0.99229, -0.00451023, -4.32607e-07)],
[0.00431724,0.993118, 2.38839e-07)],
[-72.2425, -67.9966, 0.999288)]], )
But I don't know how to use these values to merge into one image. There doesn't seem to be any documentation on what these values even mean. I found some information on transformation matrices here: Working with matrices.
But so far nothing else has helped me... Any suggestions?
My Code:
func setup() {
let floatingImage = UIImage(named:"DJI_0333")!
let referenceImage = UIImage(named: "DJI_0327")!
let request = VNHomographicImageRegistrationRequest(targetedCGImage: floatingImage.cgImage!, options: [:])
let handler = VNSequenceRequestHandler()
try! handler.perform([request], on: referenceImage.cgImage!)
if let results = request.results as? [VNImageHomographicAlignmentObservation] {
print("Perspective warp found: \(results.count)")
results.forEach { observation in
// A matrix with 3 rows and 3 columns.
let matrix = observation.warpTransform
print(matrix) }
}
}
This homography matrix H describes how to project one of your images onto the image plane of the other image. To transform each pixel to its projected location, you can to compute its projected location x' = H * x using homogeneous coordinates (basically take your 2D image coordinate, add a 1.0 as third component, apply the matrix H, and go back to 2D by dividing through the 3rd component of the result).
The most efficient way to do this for every pixel, is to write this matrix multiplication in homogeneous space using CoreImage. CoreImage offers multiple shader kernel types: CIColorKernel, CIWarpKernel and CIKernel. For this task, we only want to transform the location of each pixel, so a CIWarpKernel is what you need. Using the Core Image Shading Language, that would look as follows:
import CoreImage
let warpKernel = CIWarpKernel(source:
"""
kernel vec2 warp(mat3 homography)
{
vec3 homogen_in = vec3(destCoord().x, destCoord().y, 1.0); // create homogeneous coord
vec3 homogen_out = homography * homogen_in; // transform by homography
return homogen_out.xy / homogen_out.z; // back to normal 2D coordinate
}
"""
)
Note that the shader wants a mat3 called homography, which is the shading language equivalent of the simd_float3x3 matrix H. When calling the shader, the matrix is expected to be stored in a CIVector, to transform it use:
let (col0, col1, col2) = yourHomography.columns
let homographyCIVector = CIVector(values:[CGFloat(col0.x), CGFloat(col0.y), CGFloat(col0.z),
CGFloat(col1.x), CGFloat(col1.y), CGFloat(col1.z),
CGFloat(col2.x), CGFloat(col2.y), CGFloat(col2.z)], count: 9)
When you apply the CIWarpKernel to an image, you have to tell CoreImage how big the output should be. To merge the warped and reference image, the output should be big enough to cover the whole projected and original image. We can compute the size of the projected image by applying the homography to each corner of the image rect (this time in Swift, CoreImage calls this rect the extent):
/**
* Convert a 2D point to a homogeneous coordinate, transform by the provided homography,
* and convert back to a non-homogeneous 2D point.
*/
func transform(_ point:CGPoint, by homography:matrix_float3x3) -> CGPoint
{
let inputPoint = float3(Float(point.x), Float(point.y), 1.0)
var outputPoint = homography * inputPoint
outputPoint /= outputPoint.z
return CGPoint(x:CGFloat(outputPoint.x), y:CGFloat(outputPoint.y))
}
func computeExtentAfterTransforming(_ extent:CGRect, with homography:matrix_float3x3) -> CGRect
{
let points = [transform(extent.origin, by: homography),
transform(CGPoint(x: extent.origin.x + extent.width, y:extent.origin.y), by: homography),
transform(CGPoint(x: extent.origin.x + extent.width, y:extent.origin.y + extent.height), by: homography),
transform(CGPoint(x: extent.origin.x, y:extent.origin.y + extent.height), by: homography)]
var (xmin, xmax, ymin, ymax) = (points[0].x, points[0].x, points[0].y, points[0].y)
points.forEach { p in
xmin = min(xmin, p.x)
xmax = max(xmax, p.x)
ymin = min(ymin, p.y)
ymax = max(ymax, p.y)
}
let result = CGRect(x: xmin, y:ymin, width: xmax-xmin, height: ymax-ymin)
return result
}
let warpedExtent = computeExtentAfterTransforming(ciFloatingImage.extent, with: homography.inverse)
let outputExtent = warpedExtent.union(ciFloatingImage.extent)
Now you can create a warped version of your floating image:
let ciFloatingImage = CIImage(image: floatingImage)
let ciWarpedImage = warpKernel.apply(extent: outputExtent, roiCallback:
{
(index, rect) in
return computeExtentAfterTransforming(rect, with: homography.inverse)
},
image: inputImage,
arguments: [homographyCIVector])!
The roiCallback is there to tell CoreImage which part of the input image is needed to compute a certain part of the output. CoreImage uses this to apply the shader on parts of the image block by block, such that it can process huge images. (See Creating Custom Filters in Apple's docs). A quick hack would be to always return CGRect.infinite here, but then CoreImage can't do any block-wise magic.
And lastly, create a composite image of the reference image and the warped image:
let ciReferenceImage = CIImage(image: referenceImage)
let ciResultImage = ciWarpedImage.composited(over: ciReferenceImage)
let resultImage = UIImage(ciImage: ciResultImage)
This question already has an answer here:
Scenekit shape between 4 points
(1 answer)
Closed 4 years ago.
Similar to some of the measuring apps you can see being demonstrated in ARKit, I have a plane with 2 marker nodes on it and a line drawn between the 2. What I need though is an SCNPlane between the 2. So, if your original was the floor and you put a marker either side of a wall, you could represent the physical wall with a SCNPlane in your AR world.
Currently I'm placing the line with the following code:
let line = SCNGeometry.lineFrom(vector: firstPoint.position, toVector: secondPoint.position)
let lineNode = SCNNode(geometry: line)
lineNode.geometry?.firstMaterial?.diffuse.contents = UIColor.white
sceneView.scene.rootNode.addChildNode(lineNode)
lineFrom:
extension SCNGeometry {
class func lineFrom(vector vector1: SCNVector3, toVector vector2: SCNVector3) -> SCNGeometry {
let indices: [Int32] = [0, 1]
let source = SCNGeometrySource(vertices: [vector1, vector2])
let element = SCNGeometryElement(indices: indices, primitiveType: .line)
return SCNGeometry(sources: [source], elements: [element])
}
}
I know there are similar questions out there: 35002232 for example. But I think what I'm after is simpler. There is an answer there by a user: Windchill that I can almost get to work with a plane but I can't help but think as the plane is a simpler object, there must be a simple solution.
All I need is for the plane to have a width of the distance between the 2 points, I already know this. and the height isn't important.
Distance calc:
let position = SCNVector3Make(secondPoint.position.x - firstPoint.position.x, secondPoint.position.y - firstPoint.position.y, secondPoint.position.z - firstPoint.position.z)
let result = sqrt(position.x*position.x + position.y*position.y + position.z*position.z)
Thanks
You can create a node between 2 vectors in ARKit.
See GitHub project here, https://github.com/max6363/ARKit-LineNode-Between-2-Points.
Keep rocking.... Enjoy.... :)