Currently I am trying to create an app for iPhone which is capable of recognizing the objects on an image such as car, bus, building, bridge, human, etc, and label as object name with the help of Internet.
Is there any free service which provide solution to my problem, as object recognition its self a complex algorithm requiring digital image processing, neural networks and all.
Can this can be done via API?
If you want to recognise planar images the current generation of mobile AR SDKs from Metaio, Qualcomm and Layar will allow you to upload images to match against, and perform the matching.
If you want to match freely against a set of 3D objects, e.g. a Toyota Prius or the Empire state, the same techniques might be applied to match against sets of images taken at different rotations, but you might have to choose to match just one object due to limitations on how large an image database you can have with the service, or contact those companies for a custom solution, and it may not work very reliably given the state of the art is to reliably match against planar images.
If you want to recognize general classes (human, car, building), this is a very difficult problem, and I don't know of any solutions anywhere fast enough to operate online (which I assume is a requirement given you want an AR solution - is that a fair assumption?). It's been a few years since I studied CV, but at that time the most promising solution for visual classification was "bag of visual words" approaches - you might try reading up on those.
Take a look at Cortexica. Very useful for this sort of thing.
http://www.cortexica.com/
I haven't done work with mobile AR in a while, but the last time I was working on this stuff I was using Layar and starting to investigate Junaio. Those are oriented toward 3D graphics, not simply text labels, so for your use case you may be better served with OpenCV.
Note that Layar (and I believe Junaio too) works like a web app, where you put the content on your own server and give Layar the URL to link to.
Related
I'm currently building a HoloLens application and have a feature in-mind that requires holograms to be dynamically created, placed, and to persist between sessions. Those holograms don't need to be shared between devices.
I've had a nightmare trying to find (working) implementations and documentation for Unity WorldAnchors, with Azure Spatial Anchors seeming to stomp out most traces of it. Thankfully I've gotten past that and have managed to implement WorldAnchors by using the older HoloToolkit, since documentation for WorldAnchors in the newer MRTK also seems to have also disappeared.
MY QUESTION (because I am unable to find any docs for it) is how do WorldAnchors work?
I'd hazard a guess that it's based on spatial mapping, which presents the limitation that if you have 2 identical rooms or objects that move in the original room, the anchor/s is/are going to be lost.
What I'd LIKE to hear is that it's some magical management of transforms, which means my app has an understanding of its change in real-world location between uses even if the app is launched from a different location each time.
Does anybody know the answer or where I might look (beyond the limited Unity and MS Docs for this matter) to find out implementation details?
Thank you.
I'd hazard a guess that it's based on spatial mapping, which presents the limitation that if you have 2 identical rooms or objects that move in the original room, the anchor/s is/are going to be lost.
We won’t divulge the internal implementation details of the internal coding of the World Anchor but we can state that it is not based on GPS currently with HoloLens v1 or HoloLens v2. Currently, the World Anchor uses the data in the spatial map for placement. The underlying piece that is key is the anchors rely on the spatial scanning and the scanning can use wifi to improve the speed and accuracy, see these two references: 1 & 2
What I'd LIKE to hear is that it's some magical management of transforms, which means my app has an understanding of its change in real-world location between uses even if the app is launched from a different location each time.
It is certainly possible to have two identical rooms with exact layout to trick the mapping to think it is the same room. We document that here:
https://learn.microsoft.com/en-us/windows/mixed-reality/coordinate-systems#headset-tracks-incorrectly-due-to-identical-spaces-in-an-environment
I'm actually doing a project with the Hololens of Microsoft. The problem is that the Hololens memory is bad, so i can only make a spatialmapping of a room and not of a building because he can't remember all the building. I had an idea, maybe a can create more object and assemble them ? But no one talk about this... Do you think it's possible ?
Thanks for reading me.
Y.P
Since you don’t have a compass, you could establish some convention to help. For example, you could start the scanning by giving a voice command (and stop it by another one), and decide to only start scanning when you’re facing north, for example. Then it would be easy to know the orientation of each room. What may be harder is to get the angle exactly right. Your head might be off by a few degrees and you may have to work some “magic” (post processing) to correct it.
Or placing QR codes on a wall (printer paper + scotch tape) and using something like Vuforia can help you avoid this orientation problem altogether (you would get the QR code’s orientation which would match that of the wall).
You can also simplify the scanned mesh and convert it to planes. That way you can remember simpler objects instead of the raw spatial mapping mesh. (Search for the SurfaceToPlanes script in the Holographic Academy tutorials).
Scanning, the first layer, as in HoloLens trying to reason about the environment is an unstoppable process. There is no API for starting or stopping it. And that process also does slowly consume more and more memory as far as I know. The only thing you can do is deleting space (aka deleting holograms) or covering the sensors. But that's OS/hardware level, not app level, which you presumably want.
Layer two, what you are you probably talking about, is starting and stopping the spatial reconstruction process, where that raw spatial data is processed into a low-poly mesh (aka spatial mapping). This process can be started or stopped. For example through Unity's SpatialMappingCollider and SpatialMappingRenderer components, if you use Unity.
Finally the third level is extracting some objects/segments from that spatial mapping mesh into primitives. Like that SurfaceToPlanes. That you can also fully control in terms of when.
There has been a great confusion, especially due to the a re-naming parties in MixedRealityToolkit (overuse of word Scanning) and Unity (SpatialAnchor to WorldAnchor etc.) and misleading tutorials using a lot of colloquialisms instead of crisp terminology.
Theory aside. If you want the HoloLens to think of your entire building as one continuous space in terms of the first layer, you're out of luck. It was designed for a living room and there is a lot of voodoo involved into making it work stable in facilities 30x30 meters. You probably want to rely on disjointed "islands" with specific detection anchors to identify where you are. Or rely on markers and coordinates relative to them.
Cheers
I want to detect items in an image (like core image for a face), but the items aren't faces.
The image What can I use to do so?
I have an image with a few items, a car, a person a tree and mailbox. I want to cut the image around each item and create a subimage of each . Now i would have 1 image with a car, 1 with a person, 1 with a mailbox. There may be overlap of other objects, but the predominant feature in each would be the main object.
Thanks
This is a surprisingly complicated topic of ongoing research in the field of Computer Vision. There are many good academic papers written on the topic (heres a nice video) and no publicly available turnkey solutions.
I dont think core image currently supports this kind of functionality nor will it in the near future.
However your best bet is to start by checking out the now well established OpenCV library maintained by Willow Garage for all major operating systems (including iOS and Android). The following link might help you towards what you are looking for:
OpenCV object detection tutorials
Alternatively you could try out augmented reality toolkits designed specifically for tracking known targets. Some good examples are:
Metaio,
Vuforia,
ARLab,
String,
Junaio
EDIT, Nov 2016
Although CoreImage still does not support this, it is somewhat more likely that it may support it in the future. Recent years have seen a dramatic increase in the availability of object detection frameworks that use deep networks to perform object classification and localization.
A good first place to start would be to look at projects that use TensorFlow for Android and iOS.
One such link.
EDIT, Dec 2017
This is now fairly standard across all major mobile and desktop computing platforms (amazing how much changes in only 1 year). Specifically for Apple you can look at CoreML
I am starting an AR project for a client which involves using AR in order to show information about certain objects. In this project, for example, the user would point the camera at a car. Depending on which part of the car the user is looking at (headlights, windshield) a button would appear. When the user presses that button, an information window would appear on screen, giving the user more information about that certain car part.
The client doesn't wish to place physical markers on the car (QR code / patterns), and so the car parts would have to be detected another way.
I have developed AR apps before, but based on user location and generated markers in the sky. I feel this system wouldn't be entirely relevant for the client's request.
Would anybody be able to point me in the right direction (iOS library) for this sort of project, and whether or not it would be entirely feasible.
Thanks for the input,
Andy.
What you need is a model-based tracker/6DOF object tracker. As you want to track a car, it will certainly be featureless (or you will only get sparse features), so you should look at textureless non planar 3D (object) tracking solutions.
It's pretty much state of the art right now (lot of research, few products/SDK), but using library like OpenCV and with the appropriate literature (see below) you should be able to develop one. You can look at an open-source solution like the ViSP library which has a module for model based tracker but not an official iOS port. for commercial libraries, closest will be AR libraries supporting SLAM or "3D object tracking".
In term of techniques, you have different way to handle this problem, some pointers:
You can use a model-based tracker relying on edge detection + initial CAD model of the object: 3D Textureless Object Detection and Tracking: An Edge-based Approach or Harald Wuest, Folker Wientapper, Didier Stricker Adaptable Model-based Tracking Using Analysis-by-Synthesis Techniques
The 12th International Computer Analysis of Images and Patterns (CAIP), 27-29th August 2007, Vienna, Austria.
You can use a model-based tracker relying on edge detection + (trained) template images
You can use some SLAM techniques combined with a model based tracker.
M. Tamaazousti, V. Gay-Bellile, S. Naudet Collette, S. Bourgeois, M. Dhome Real-Time Accurate Localization in a Partially Known Environment:Application to Augmented Reality on textureless 3D Objects. TrakMark 2011, Basel, Switzerland 26-29/10/2011
if your system will only run indoor, you can look at some RGBD tracker
S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, N. Navab
Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes Asian Conference on Computer Vision (ACCV), Korea, Daejeon, November 2012
(access to the software)
It seems you are heading for an interesting topic. However, my concern is the accuracy of what you are trying to do. Location-based AR would be a starting point for your research work. Still the granularity would be less to your problem domain. Since you have worked on the location-based AR application, you might have noticed the accuracy that you can expect would be maximum upto 3 meters. Therefore, that level of accuracy cannot address your problem domain in an advanced way.
However, I have seen prototypes that addresses your problem domain. One good example would be the BMW Augmented Reality Manual. Check this link http://www.youtube.com/watch?v=P9KPJlA5yds
Hence, I never came across a proper Augmented Reality library for iOS or even Android which can address your problem domain in the marker-less AR context.
The information above is only for your knowledge, but not to discourage you in any way.
I need to find a way to model a physical place inside an iPhone application. For example, I want to be able to take images for a restaurant and then use some tools or programming API to model this resturant as a 3d place and make the user able to navigate and explore the place and rooms.
I have thought about HTML 5 inside a web view but I don't think the WebGL is compatible with iPhone Web View (Safari Engine).
Can you please recommend a method, API, Commercial Library or anything to help me achieve this task?
First, you need to be able to display 3D models for IPhone. One of the most popular 3D engine is Unity3D:
http://unity3d.com/
It is extremely easy to start playing with Unity3D. You even have a free license with limited features:
http://unity3d.com/unity/licenses
Then, you now need to reconstruct a 3D model from pictures. This is not a trivial problem so it is better if you know some computer vision. You can try to play with OpenCV:
http://opencv.willowgarage.com/wiki/
Best regards.
Actually Nuke from the Foundry has a decent start at the future of creating computer models from images.
Basically it takes a high contrast point and tracks it through successive moments. Given hundreds and thousands of tracked points, the next step is to calculate the perspective change between points.
Say two points are a known pixel distance apart at time zero and a certain time period later they are a different distance apart. This change in difference could be a bad tracking point. But assuming that the two points are perfectly tracking, then the distance change could be caused by a camera motion laterally or rotationally. And in real space a point further away from you will have a different perspective then a closer point . This perspective change is a mathematical certainty.
Initially the tracking is typically used to refilm a piece of film to stabilize it. But the process the software uses to analyze the film can be saved , it is often called a point cloud. connection of many close points that track very closely usually are because the points are parts of a surface, so a model can be built.
But my friend, we are barbarians to the speed and software that can do that perfectly yet. Or all the CG Artists out there would not have anything to model in Maya except fantasy monsters and space ships that don't exist yet....