I want to detect items in an image (like core image for a face), but the items aren't faces.
The image What can I use to do so?
I have an image with a few items, a car, a person a tree and mailbox. I want to cut the image around each item and create a subimage of each . Now i would have 1 image with a car, 1 with a person, 1 with a mailbox. There may be overlap of other objects, but the predominant feature in each would be the main object.
Thanks
This is a surprisingly complicated topic of ongoing research in the field of Computer Vision. There are many good academic papers written on the topic (heres a nice video) and no publicly available turnkey solutions.
I dont think core image currently supports this kind of functionality nor will it in the near future.
However your best bet is to start by checking out the now well established OpenCV library maintained by Willow Garage for all major operating systems (including iOS and Android). The following link might help you towards what you are looking for:
OpenCV object detection tutorials
Alternatively you could try out augmented reality toolkits designed specifically for tracking known targets. Some good examples are:
Metaio,
Vuforia,
ARLab,
String,
Junaio
EDIT, Nov 2016
Although CoreImage still does not support this, it is somewhat more likely that it may support it in the future. Recent years have seen a dramatic increase in the availability of object detection frameworks that use deep networks to perform object classification and localization.
A good first place to start would be to look at projects that use TensorFlow for Android and iOS.
One such link.
EDIT, Dec 2017
This is now fairly standard across all major mobile and desktop computing platforms (amazing how much changes in only 1 year). Specifically for Apple you can look at CoreML
Related
I'm learning up on RealityKit and trying to create a city landscape.
Watched this video from Apple and downloaded the associated project talking about RealityComposer
https://developer.apple.com/videos/play/wwdc2019/605
My initial goal is to create a city street with tall buildings and a controllable character which can walk around the streets and perform tasks (character controlled by the user)
I've played with RealityComposer but it doesn't seem like the tool for creating complex landscapes or characters for this use case (I could be wrong). seems more of a prototype tool for fast POC
I'm assuming that there are tools such as sketch and open usdz files (tried googling and searching but nothing substantial came up)
What is the appropriate workflow for this type of app (game) development?
I would recommend one of two options:
A. Programmatically add and control models within the AR View. This will require a decent knowledge of Swift and a lot of looking around for examples and reading the docs for RealityKit.
B. Switch over to Unity. Unity would be a lot easier to work with and is designed for games (Which is what it sounds like you want to do). Bonus is your game/app will be cross platform.
I'm currently building a HoloLens application and have a feature in-mind that requires holograms to be dynamically created, placed, and to persist between sessions. Those holograms don't need to be shared between devices.
I've had a nightmare trying to find (working) implementations and documentation for Unity WorldAnchors, with Azure Spatial Anchors seeming to stomp out most traces of it. Thankfully I've gotten past that and have managed to implement WorldAnchors by using the older HoloToolkit, since documentation for WorldAnchors in the newer MRTK also seems to have also disappeared.
MY QUESTION (because I am unable to find any docs for it) is how do WorldAnchors work?
I'd hazard a guess that it's based on spatial mapping, which presents the limitation that if you have 2 identical rooms or objects that move in the original room, the anchor/s is/are going to be lost.
What I'd LIKE to hear is that it's some magical management of transforms, which means my app has an understanding of its change in real-world location between uses even if the app is launched from a different location each time.
Does anybody know the answer or where I might look (beyond the limited Unity and MS Docs for this matter) to find out implementation details?
Thank you.
I'd hazard a guess that it's based on spatial mapping, which presents the limitation that if you have 2 identical rooms or objects that move in the original room, the anchor/s is/are going to be lost.
We won’t divulge the internal implementation details of the internal coding of the World Anchor but we can state that it is not based on GPS currently with HoloLens v1 or HoloLens v2. Currently, the World Anchor uses the data in the spatial map for placement. The underlying piece that is key is the anchors rely on the spatial scanning and the scanning can use wifi to improve the speed and accuracy, see these two references: 1 & 2
What I'd LIKE to hear is that it's some magical management of transforms, which means my app has an understanding of its change in real-world location between uses even if the app is launched from a different location each time.
It is certainly possible to have two identical rooms with exact layout to trick the mapping to think it is the same room. We document that here:
https://learn.microsoft.com/en-us/windows/mixed-reality/coordinate-systems#headset-tracks-incorrectly-due-to-identical-spaces-in-an-environment
I am starting an AR project for a client which involves using AR in order to show information about certain objects. In this project, for example, the user would point the camera at a car. Depending on which part of the car the user is looking at (headlights, windshield) a button would appear. When the user presses that button, an information window would appear on screen, giving the user more information about that certain car part.
The client doesn't wish to place physical markers on the car (QR code / patterns), and so the car parts would have to be detected another way.
I have developed AR apps before, but based on user location and generated markers in the sky. I feel this system wouldn't be entirely relevant for the client's request.
Would anybody be able to point me in the right direction (iOS library) for this sort of project, and whether or not it would be entirely feasible.
Thanks for the input,
Andy.
What you need is a model-based tracker/6DOF object tracker. As you want to track a car, it will certainly be featureless (or you will only get sparse features), so you should look at textureless non planar 3D (object) tracking solutions.
It's pretty much state of the art right now (lot of research, few products/SDK), but using library like OpenCV and with the appropriate literature (see below) you should be able to develop one. You can look at an open-source solution like the ViSP library which has a module for model based tracker but not an official iOS port. for commercial libraries, closest will be AR libraries supporting SLAM or "3D object tracking".
In term of techniques, you have different way to handle this problem, some pointers:
You can use a model-based tracker relying on edge detection + initial CAD model of the object: 3D Textureless Object Detection and Tracking: An Edge-based Approach or Harald Wuest, Folker Wientapper, Didier Stricker Adaptable Model-based Tracking Using Analysis-by-Synthesis Techniques
The 12th International Computer Analysis of Images and Patterns (CAIP), 27-29th August 2007, Vienna, Austria.
You can use a model-based tracker relying on edge detection + (trained) template images
You can use some SLAM techniques combined with a model based tracker.
M. Tamaazousti, V. Gay-Bellile, S. Naudet Collette, S. Bourgeois, M. Dhome Real-Time Accurate Localization in a Partially Known Environment:Application to Augmented Reality on textureless 3D Objects. TrakMark 2011, Basel, Switzerland 26-29/10/2011
if your system will only run indoor, you can look at some RGBD tracker
S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, N. Navab
Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes Asian Conference on Computer Vision (ACCV), Korea, Daejeon, November 2012
(access to the software)
It seems you are heading for an interesting topic. However, my concern is the accuracy of what you are trying to do. Location-based AR would be a starting point for your research work. Still the granularity would be less to your problem domain. Since you have worked on the location-based AR application, you might have noticed the accuracy that you can expect would be maximum upto 3 meters. Therefore, that level of accuracy cannot address your problem domain in an advanced way.
However, I have seen prototypes that addresses your problem domain. One good example would be the BMW Augmented Reality Manual. Check this link http://www.youtube.com/watch?v=P9KPJlA5yds
Hence, I never came across a proper Augmented Reality library for iOS or even Android which can address your problem domain in the marker-less AR context.
The information above is only for your knowledge, but not to discourage you in any way.
Currently I am trying to create an app for iPhone which is capable of recognizing the objects on an image such as car, bus, building, bridge, human, etc, and label as object name with the help of Internet.
Is there any free service which provide solution to my problem, as object recognition its self a complex algorithm requiring digital image processing, neural networks and all.
Can this can be done via API?
If you want to recognise planar images the current generation of mobile AR SDKs from Metaio, Qualcomm and Layar will allow you to upload images to match against, and perform the matching.
If you want to match freely against a set of 3D objects, e.g. a Toyota Prius or the Empire state, the same techniques might be applied to match against sets of images taken at different rotations, but you might have to choose to match just one object due to limitations on how large an image database you can have with the service, or contact those companies for a custom solution, and it may not work very reliably given the state of the art is to reliably match against planar images.
If you want to recognize general classes (human, car, building), this is a very difficult problem, and I don't know of any solutions anywhere fast enough to operate online (which I assume is a requirement given you want an AR solution - is that a fair assumption?). It's been a few years since I studied CV, but at that time the most promising solution for visual classification was "bag of visual words" approaches - you might try reading up on those.
Take a look at Cortexica. Very useful for this sort of thing.
http://www.cortexica.com/
I haven't done work with mobile AR in a while, but the last time I was working on this stuff I was using Layar and starting to investigate Junaio. Those are oriented toward 3D graphics, not simply text labels, so for your use case you may be better served with OpenCV.
Note that Layar (and I believe Junaio too) works like a web app, where you put the content on your own server and give Layar the URL to link to.
I was looking at some study i have to do in the future to do with procedural generation techniques and i was wondering what type of content you have:
Developed
Helped Develop
Seen implemented
Tried to develop
and what methods/techniques/procedures you used to develop it.
If you feel generous maybe you can even go into specifics of it such as data structures ad algorithms you have used to develop it.
If this needs to be put as community wiki because it is not me asking for a problem to be solved just let me know.
This is not a homework thread because it is a research unit that i'm not taking yet ;)
Introversion software, the makers of the games Defcon, Uplink and Darwinia (among others) have started working on a game about a year ago which extensively uses PCG for city generation, here is a video of their work, and you can read more about it on the development diary of the game (start from the first part at the bottom of the page!).
This immediately got me extremely interested, and seeing the potential for games I immediately started researching the technology. I have amassed a folder of 18 PDFs about the subject (research papers, SIGGRAPH presentations, etc). Here, I uploaded it for you.
The main approach is to use L-Systems, however, I never got around to understanding enough of that to make something out of this. I tried other, less successful approaches like using Voronois, recursively splitting a rectangular area into more smaller areas and shifting the boundaries a little to obtain a bit of randomness and polygon division.
The last method I had gotten from Mike's Code Blog's posts (here and here). The screenshots shown on his blog make me drool, it is my biggest programmer's dream to ever get something that looks like that. I emailed him to ask how he did it, and here is the relevant part of his reply, I'm sure he wouldn't mind me posting this here:
L-Systems is definitely one way to go, but that isn't what I'm doing. The basis of my method is polygon subdivision. I start with a simple polygon that represents the entire area of the city. Then, I split it (roughly) in half, and then split those two polygons, etc. until I get down to city-block size. At that point, the edges of all my polygons represent roads. I then use the same subdivision method to break the blocks down into building-size lots.
The devil is in the details, of course, but that is the basic method.
I for one still haven't managed to fully implement a solution of which I'm satisfied of, but it remains one of, if not my single biggest programmer's dream to ever achieve something like this.
Here are a few of the leaders in procedurally generated terrain (and to a lesser extent foliage). If you don't get a detailed answer here regarding methods and techniques, you might want to look in / ask in their forums. I have seen some discussions of techniques there.
TerraGen 2
World Builder
World Machine
Natural Graphics
Noone mentioned the demoscene that ONLY use procedural stuff?
So, go search for Werkkzeug, Kkrieger, MilkyTracker to start. Also you can visit the site pouet and see the wonder of well done procedural videos (yes, procedural videoclips! With music and graphics, all procedural!)
Allegorithmic's products are used in actual shipping titles. These guys focus on texture generation (both offline and at runtime).
They have some very pretty screenshots and demos.