I want to implement an application, that is able to recognize pictures from camera input. I don't mean classification of objects, but rather detecting the exact single image from given set of images. So if I for example have an album with 500 pictures, then if I point a camera to one of them, then application will be able to tell it's filename. Most of tutorials I find about CoreML is strictly for image classification (recognizing class of object) and not about recognizing exact image name in camera. This needs to work from different angles as well, and all I can have for training the network is this album with many different pictures (single picture for single object). Can this be somehow achieved? I can't use ARKit Image Tracking, because there will be about 500 of these images, and I need to find at least a list of similar ones first with CoreML / Vision.
I am not sure, but I guess perceptual hashing might be able to help you.
It works in a way that it makes some fingerprint from the reference images, and for a given image, it extracts the fingerprints as well, and then you can find the most similar fingerprints.
in this way, even if the new image is not 100% as the image in the dataset, you still can detect it.
It is actually not very hard to implement. but if you would like, i think phash library is a good one to use.
Related
I want to detect shape and then describe it (somehow) to compare it with server data.
So the first question is, is it possible to detect shape like blob with ARKit?
To be more specific, let's describe my usecase generally.
I want to scan image by phone, get the specific shape, send it on server, compare two images on server (server image is the real one, scanned image would be very similar) and then send back some data. I am not asking about server side, the only question about server side is what should I compare - images using OpenCV, some mathematical description of both images and try to find similarity, etc.).
If the question is hard to understand, let's split it on two easy questions:
1) How to scan 2D object by iPhone and save it (trim the specific shape from its background when object is black and background white).
2) Describe scanned object for comparision with almost the same object.
ARKit has no use here.
You will probably need a lot of CoreImage (for fixing perspective distortion and binarization) and OpenCV logic.
Perhaps Vision can help you a little bit with getting ROI from the entire frame, especially if the waveform image is located in some kind of rectangle.
Perhaps you can train a custom ML model that will recognize specific waveforms or waveforms in general to use with Vision.
In any case, it is not a trivial task.
If I have understand well, 3D 360 photos are created from a panorama photo, so I guess it should be possible to create a 3D photo (non 360) from a normal photo. But how? I did not find anything in Google! Any idea of what should I search??
So far, if nothing available (I don't think so), I'll try to duplicate the same photo in each eye. One of the pictures a little bit moved to the right, and the other one moved a little bit to the left. But I think the distortion algorithm is much more complicated.
Note: I'm also receiving answers here: https://plus.google.com/u/0/115463690952639951338/posts/4KdqFcqUTT9
I am in no way certain of this, but my intuition on how 3D 360 images are created in GoogleVR is this:
As you take a panorama image, it actually takes a series of images. As you turn the phone around, the perspective changes slightly with each image, not only by angle, but also offset (except in the unlikely event you spin the phone around its own axis). When it stitches together the final image, it creates one image for each eye, picking suitable images from the series so that it creates a 3D effect when viewed together. The same "area" of the image for each eye comes from a different source image.
You can't do anything similar with a single image. It's the multitude of images produced, each with a different perspective coming from the turning of the phone, that enables the algorithm to create a 3D image.
2D lacks a dimension hence cannot be converted to 3D just like that, but there are clever ways for example Google Pixel even though doesn't have 2 camera can make it seem like the image is 3D by applying some Machine learning algorithm that create the effect of perspective and depth by selective blurring.
3d photos can't be taken by normal but you can take 360 photos with normal camera ..... There are many apps via which you can do this ..... Also there are many algorithms to do it programmatically
I'm a noob to this forum, but wanted to give it a try.
I'm currently learning Objective-C and Cocoa; trying to build my first iPhone app.
One thing I'm working on is allowing the user to cut his/her face from an image they have taken and paste it into another image. (The idea is cut from one image and paste into another image with a spot for a face to go.)
How can this be done? I am thinking I would allow the user to just touch and drag over their face, in the shape of a rectangle, and then allow them to copy.
Thanks for the help.
Ok, nevertheless your bit arrogant style of asking, here are some guidelines about how to start: generic obj-c/iOS development (start from hello world); UIImage class; camera API; image processing algorithms, face detection algorithms. Go on gradually and do not wish to resolve all problems at once. Write first an application that simple loads an arbitrary photo and shows it to the user. Then modify it that you can crop a specified rectangular area from the image and save it into the new file. Then write an app that switches on the camera that you can take an image and save it to the disk. Then unite what you wrote that you save only a cropped area of the captured image.
When you arrive to this point, you will know much more about software development image handling. AFTER THIS you can start looking for image processing algorithms. Start also here with something simple like a trivial blur filter or similar implemented by you. If you know already a bit of image processing, search for face detection algorithms on the net. It is even possible that you will find some ready framework that includes also these features, or at least you will understand the concepts. You can even come back here to stack overflow and ask for suggestions about a good face detection algorithms, however we still prefer if you have chosen already one and have some concrete issue with it.
I have a database of images of one person who is using his hands to show various words and phrases in sign language. The background is white and the only thing changing is the shape of the person's hands and their locations. Now in my gui in matlab, I want the user to be able to choose another image from the same person that was taken at another time doing a sign but wearing the same clothes and then the program will have to compare this against the images in the database and show the most similar. Obviously I can't do pixel by pixel comparison as the images were taken by a hand held mobile camera and slight movement has been inevitable so I should try and locate the hands in the images and compare their shapes. I have no idea how to go about this? I have to say I am new to image processing toolbox in matlab.
Your help is much appreciated
I am doing a phD in computer vision, and I can tell you that it is an unsolved problem. (even in your simple framewrok, with white background)
If you are interested, you might read some works about it ar MIT:
http://people.csail.mit.edu/rywang/handtracking/
or at Oxford:
http://www.robots.ox.ac.uk/~vgg/research/sign_language/index.html
http://www.robots.ox.ac.uk/~vgg/research/hands/index.html
I disagree with you. Such a project can achieve results quickly.
This becomes a problem as soon as the project has to deal with "real life".
Using a single camera, and a completely known background; Opencv provides a simple way to extract hand shape in a image (in about 20 lines of code). You will find plenty of source on the web (have a look at calcbackproj).
After that, what you will have to do is to play with shape, and search for characteristic points.
Begin with some simple signs (example : a circle and a V). How would you recognize one from the other?
There are thousands of papers on sign language; just read the older one to simple ideas flowing :)
I want to remove watermark from a picture within my iPhone / iPad application. Is there any kind of image processing I can perform within this application to do this?
Can't be done, sorry.
The watermarked image were originally two images (the base and the watermark), which were merged together to form the result. The problem here is that the most common image formats (such as JPG, PNG, or GIF) have no concept of layers - so that the base would be one layer, and the watermark another: the result is just one layer, onto which both were redrawn. This is somewhat similar to a physical painting: if you paint one image on a paper using watercolors, and then another over the same spot, their colors will mix and you won't be able to tell which parts belong to one or the other, as they'd become a single image.
This is similar with the computer image formats: there is only one "layer", which for every pixel encodes exactly one color that is there - only the current color exists, and the image doesn't keep track what was on that pixel before.
Now, the information is irreversibly lost from the result - in other words, it is not possible to recover the base knowing just the result (or the result and watermark) - BTW, that's exactly the point of watermarking.
I have borrowed the image sprites of StackOverflow for a demonstration; the actual images used are not unique, the technique would work just as well with any images. This was the watermark I used:
And this is the result image, after merging with the base:
Now, even though we have the exact watermark image used, there's no way to recover what was underneath that star in the original image. Through image processing operations, we could almost remove the star from the result, but there's not enough data to tell us what used to be underneath: - that information got erased in the merge at the beginning.
We could guess what used to be there, but then we're not doing recovery any more, we're interpreting the image and guessing what possibly could have been there - and that's pretty hard, even for a human; computers are really bad at that. This is the original image, before I watermarked it - I bet you were expecting something slightly different, no?
The watermark is almost certainly part of the image. (The only case in which it wouldn't be is something like PDF or SVG, where it could be a separate vector element.)
Watermarks are typically present on images for purposes of managing intellectual property; if one has licensed an image for a particular use, typically one will receive access to a version of the image without a watermark. Thus wanting to "remove watermarks" is also likely to be treated as highly suspicious.
Watermarks are part of the image, there isn't going to be a magic way to remove them and recover the missing pixels in any tool.
Take a look at the source! Most or the current watermarking is done in php as an automated script. In most cases you will see the base picture in source