I am building a game on Unity with two Azure Kinects. How do I calibrate them to have get the positional data of body so solve occlusion?
Currently I get two bodies for each person. How can I map the 2 virtual bodies (from each camera) to each individual person?
Your idea is great as Multiple-camera setups offer a solution to increase the coverage of the captured human body and to minimize the occlusions.
Please go through the document: Benefits of using multiple Azure Kinect DK devices to read more on Fill in occlusions. Although the Azure Kinect DK data transformations produce a single image, the two cameras (depth and RGB) are actually a small distance apart. The offset makes occlusions possible. Use the Kinect SDK to capture the depth data from both devices and store it in separate matrices. Align the two matrices using a 3D registration algorithm. This will help you to map the data from one device to the other, taking into account the relative position and orientation of each device.
Please refer to this article published by: Nadav Eichler
Spatio-Temporal Calibration of Multiple Kinect Cameras Using 3D Human Pose
Quoted:
When using multiple cameras, two main requirements must be fulfilled
in order to fuse the data across cameras:
Camera Synchronization (alignment between the cameras’ clocks).
Multi-Camera Calibration (calculating the mapping between cameras’
coordinate systems).
Related
I want to simulate real-time in Unity using GPS data (Latitude / Longitude/ Altitude) of the aircraft moving in another flight simulator. In this way, the aircraft in Unity, should act the same as the aircraft in the other simulator.
As is known, Unity uses the xyz coordinate system. I have studied many examples to transform these two different types of data into one another. But in all of them, problems occur in coordinate transformations and aircrafts move differently. However, I still do not understand how to do it.
Is there an easy formula for realizing this transformation?
Here are a few examples of instant data I receive from the simulator:
<GPS>
<Lat>21.325352</Lat>
<Long>-157.929607</Long>
<Al>885.512322</Al>
</GPS>
<GPS>
<Lat>21.325356</Lat>
<Long>-157.929555</Long>
<Al>886.829367</Al>
</GPS>
<GPS>
<Lat>21.325357</Lat>
<Long>-157.929540</Long>
<Al>887.487356</Al>
</GPS>
I would like to use three kinects v2 running on three computers and then gather their data in one computer (real time 3d reconstruction using unity3d). Is it possible to do so and how ? Thank you.
So what you're asking is very do-able, it just takes a lot of work.
For reference I'm referring to the frames of the 3D point cloud gathered by the kinect as your image.
All you need is to set up a program on each of your kinect-computers that runs them as a client. With the other computer you can run that as a server and have the clients sending packets of images with some other data attached.
The data you'll need at a minimum will be angle and position from 'origin'.
For this to work properly you need to be able to reference the data in all your kinects to each other. The easiest way to do this is to have a known point and measure the distance from that point and angle the kinects are facing vs North and sea level.
Once you have all that data you can take each image from each computer and rotate the bit clouds using trigonometry, then combine all the data. Combing the data is something you'll have to play with as there are loads of different ways to do it and it will depend on your application.
I am looking for a code or an application which can extract the salient object out of a video considering both context and motion,
or
an algorithm just for motion saliency map detection (motion contrast) so I can fuse it with a context_aware salient object detector that I have.
Actually I have tested context_aware saliency map detector already but it in some frame detects some part of background as salient object and I want to involve the motion and time in this detection so I can extract the exact salient object as it's possible.
Can anyone help me?
one of the most popular approaches (although a bit dated) in the computer vision community is the graph based visual saliency (GBVS) model.
it uses a graph-based method to compute visual saliency. first, the same feature maps than in the fsm model are extracted. it leads to three multiscale feature maps: colors, intensity and orientations. then, a fully-connected graph is built over all grid locations of each feature map and a weight is assigned between each node. this weight depends on the spatial distance and the value of the feature map between nodes. finally, each graph is treated as markov chains to build an activation map where nodes which are highly dissimilar to surrounding nodes will be assigned high values. finally, all activation maps are ultimately merged into the final saliency map.
you can find matlab source code here: http://www.vision.caltech.edu/~harel/share/gbvs.php
Recently, I have to do a project of multi view 3D scanning within this 2 weeks and I searched through all the books, journals and websites for 3D reconstruction including Mathworks examples and so on. I written a coding to track matched points between two images and reconstruct them into 3D plot. However, despite of using detectSURFFeatures() and extractFeatures() functions, still some of the object points are not tracked. How can I reconstruct them also in my 3D model?
What you are looking for is called "dense reconstruction". The best way to do this is with calibrated cameras. Then you can rectify the images, compute disparity for every pixel (in theory), and then get 3D world coordinates for every pixel. Please check out this Stereo Calibration and Scene Reconstruction example.
The tracking approach you are using is fine but will only get sparse correspondences. The idea is that you would use the best of these to try to determine the difference in camera orientation between the two images. You can then use the camera orientation to get better matches and ultimately to produce a dense match which you can use to produce a depth image.
Tracking every point in an image from frame to frame is hard (its called scene flow) and you won't achieve it by identifying individual features (such as SURF, ORB, Freak, SIFT etc.) because these features are by definition 'special' in that they can be clearly identified between images.
If you have access to the Computer Vision Toolbox of Matlab you could use their matching functions.
You can start for example by checking out this article about disparity and the related matlab functions.
In addition you can read about different matching techniques such as block matching, semi-global block matching and global optimization procedures. Just to name a few keywords. But be aware that the topic of stereo matching is huge one.
I'm currently trying to implement a silhouette algorithm in my project (using Open GLES, it's for mobile devices, primarily iPhone at the moment). One of the requirements is that a set of 3D lines be drawn. The issue with the default OpenGL lines is that they don't connect at an angle nicely when they are thick (gaps appear). Other subtle artifacts are also evident, which detract from the visual appeal of the lines.
Now, I have looked into using some sort of quad strip as an alternative to this. However, drawing a quad strip in screen space requires some sort of visibility detection - lines obscured in the actual 3D world should not be visible.
There are numerous approaches to this problem - i.e. quantitative invisibility. But such an approach, particularly on a mobile device with limited processing power, is difficult to implement efficiently, considering raycasting needs to be employed. Looking around some more I found this paper, which describes a couple of methods for using z-buffer sampling to achieve such an effect. However, I'm not an expert in this area, and while I understand the theory behind the techniques to an extent, I'm not sure how to go about the practical implementation. I was wondering if someone could guide me here at a more technical level - on the OpenGLES side of things. I'm also open to any suggestions regarding 3D line visibility in general.
The technique with z-buffer will be too complex for iOS devices - it needs heavy pixel shader and (IMHO) it will bring some visual artifacts.
If your models are not complex you can find geometric silhouette in runtime - for example by comparing normals of polygons with common edge: if z value of direction in view space has different sings (one normal is directed to camera and other is from camera) then this edge should be used for silhouette.
Another approach is more "FPS friendly": keep extruded version of your model. And render firstly extruded model with color of silhouette (without textures and lighting) and normal model over it. You will need more memory for vertices, but no real-time computations.
PS: In all games I have look at silhouettes were geometric.
I have worked out a solution that works nicely on an iPhone 4S (not tested on any other devices). It builds on the idea of rendering world-space quads, and does the silhouette detection all on the GPU. It works along these lines (pun not intended):
We generate edge information. This consists of a list of edges/"lines" in the mesh, and for each we associate two normals which represent the tris on either side of the edge.
This is processed into a set of quads that are uploaded to the GPU - each quad represents an edge. Each vertex of each quad is accompanied by three attributes (vec3s), namely the edge direction vector and the two neighbor tri normals. All quads are passed w/o "thickness" - i.e. the vertices on either end are in the same position. However, the edge direction vector is opposite for each vertex in the same position. This means they will extrude in opposite directions to form a quad when required.
We determine whether a vertex is part of a visible edge in the vertex shader by performing two dot products between each tri norm and the view vector and checking if they have opposite signs. (see standard silhouette algorithms around the net for details)
For vertices that are part of visible edges, we take the cross product of the edge direction vector with the view vector to get a screen-oriented "extrusion" vector. We add this vector to the vertex, but divided by the w value of the projected vertex in order to create a constant thickness quad.
This does not directly resolve the gaps that can appear between neighbor edges but is far more flexible when it comes to combating this. One solution may involve bridging the vertices between large angled lines with another quad, which I am exploring at the moment.