CustomVision.ai Object Detection vs Image Classification - classification

I want to use CustomVision.ai to detect pack of cigarettes within a shopping cart. I have tons of images of contents of shopping carts without cigarettes, and several images of the content with cigarettes. All images are taken from the same angle, like 1 meter above the cart.
So my question is, should I rather use CustomVision.ai with Object Tracking to define the area of the cigarettes or should I use a classifier to identify just all images with cigarettes, since I could also use tons of Negative examples here.
The resolution of these images is 640x480px.
Here is a sample image:

Related

SwiftUI tap on image to select an area - better way than this?

Updating an app I did for a car club that connects their customers (dealerships, parties, firehouse events, town events, tv commercials, magazine ads, etc) with their members to rent out fancy/classic/muscle cars for photo ops and eye-candy at events. The car owner gets paid, the car club takes a small percentage for club costs and events. It handles CRM stuff, scheduling, photos, etc.
The new feature they want is a way to quickly look over a car before and after an event, tap an area on the screen and describe any damages (plus other functions). They want to be able to look up stuff over time and do comparisons, etc., perhaps generate repair invoices, etc.
I have come up with a basic formula that works: image of the car is displayed, a transparent mask image above it in the z-order, masked with different colors. The user taps, I look up the color at the tap event, draw a circle on the image, use that color as an index into part or region list, record all the info, and bob's your uncle.
This just shortcuts having a bunch of drop downs or selectors or whatever to manually pick a part or region from a list, and give is some visual sugar.
Works, works nicely, and is consistently reliable (images are PNG - colors get munched up too much in JPG compression). This works great. It all falls apart if they decide they want to change images; they want me to retroactively draw circles on the new images based on old records' information. My firm line so far has been "no, you can't do that", because the tap locations are tied to the original images. They're insistent on trying, so...
I have two questions.
First is simple - am I missing something painfully obvious as a better way to do this? (select a known value for a section of a graphic)?
Second one is - loading stock images into the asset catalog, displaying them from 1x images, finding the scale value and adjusting tap locations, etc., all works great. At 2x and 3x, the scaling gets wonky. Loading from storage is the bigger issue... it seems when I load a pair image files from storage, turn it into a Data object, then shove that into a UIImage for display in a SwiftUI Image View, I lose the easy scaling from when I embed images in the assets catalog under the 1x slot. Is there a way to load the file->Data->UIImage->Image( uiImage: xxx ) and force a 1x rendering, skipping any auto-rendering/scaling that iOS might do?
Thoughts?
Below are quickly masked sample images I'm using to display the car and mask, each green area is slightly different in the RGB's G value, and I just use that as the lookup key for the part name in the description ("Front left fender", "Rocker panel", "Left rear wheel", "Windshield", and so forth).

Is it possible to limit Google AutoML image augmentation?

I have seen that Google's AutoML will use some pre-training image augmentation to increase the robustness of the model.
(Adjacent discussion)
I have searched the documentation and forums for a way to limit these techniques. For instance it applies flips to the objects. However, in some cases flips hurt the predictions. For instance recognizing numbers in an image. For most fonts, 2's and 5's are different enough to have different features, even when flipped. However, a 7-segment display will have the same representation for 2's and 5's when they are flipped.
7-segment display example
I have labeled hundreds of images with many digits in each image. The model continues to confuse the 2's and 5's for the 7-segment displays. It has some success but not an acceptable amount.
Does anyone know if limiting the image augmentation with AutoML is possible?

CoreML Image Detection

I want to implement an application, that is able to recognize pictures from camera input. I don't mean classification of objects, but rather detecting the exact single image from given set of images. So if I for example have an album with 500 pictures, then if I point a camera to one of them, then application will be able to tell it's filename. Most of tutorials I find about CoreML is strictly for image classification (recognizing class of object) and not about recognizing exact image name in camera. This needs to work from different angles as well, and all I can have for training the network is this album with many different pictures (single picture for single object). Can this be somehow achieved? I can't use ARKit Image Tracking, because there will be about 500 of these images, and I need to find at least a list of similar ones first with CoreML / Vision.
I am not sure, but I guess perceptual hashing might be able to help you.
It works in a way that it makes some fingerprint from the reference images, and for a given image, it extracts the fingerprints as well, and then you can find the most similar fingerprints.
in this way, even if the new image is not 100% as the image in the dataset, you still can detect it.
It is actually not very hard to implement. but if you would like, i think phash library is a good one to use.

How to detect contours of object and describe it to compare on server with ARKit

I want to detect shape and then describe it (somehow) to compare it with server data.
So the first question is, is it possible to detect shape like blob with ARKit?
To be more specific, let's describe my usecase generally.
I want to scan image by phone, get the specific shape, send it on server, compare two images on server (server image is the real one, scanned image would be very similar) and then send back some data. I am not asking about server side, the only question about server side is what should I compare - images using OpenCV, some mathematical description of both images and try to find similarity, etc.).
If the question is hard to understand, let's split it on two easy questions:
1) How to scan 2D object by iPhone and save it (trim the specific shape from its background when object is black and background white).
2) Describe scanned object for comparision with almost the same object.
ARKit has no use here.
You will probably need a lot of CoreImage (for fixing perspective distortion and binarization) and OpenCV logic.
Perhaps Vision can help you a little bit with getting ROI from the entire frame, especially if the waveform image is located in some kind of rectangle.
Perhaps you can train a custom ML model that will recognize specific waveforms or waveforms in general to use with Vision.
In any case, it is not a trivial task.

How to draw an image according to the pixels of another image?

HI all ,what i want is to map the images.Suppose i have two images of persons,one is of fat person and another is of weak person,Now i want to match their faces ,eyes.I want to increase or decrease the face size eye size of one image according to another.As you can see in adobe photoshop you can make the face fat,make it squueze.I want to do the image manuplation in this.These types of operations i want to implement.I don't know from where to start.
Pleas guide and help me.Can i perform all this with core graphics if so then how
Any reference,tutorial address ,sample code ........appreciated.
You are probably going to have to deal with some sort of edge detection and face recognition algorithms, at the very least, if this is to be accomplished automatically. Otherwise, if the user is going to be resizing one image to match the other, this will require simple resizing operations driven by perhaps user pinch & gestures.
UPDATE:
For manual resizing:
Download the source code for the great book Cool iPhone Projects. One of the projects is called 'Touching'. This project contains code that accomplishes what you need: pinch and zoom functionality.