fast image annotation for large dataset - annotations

I have a dataset related to plant disease. the dataset has 70k images and 38 classes dataset and I want to annotate images with a bounding box. I try for annotation LabelImg and other annotation tools but all of them are so time-consuming. is there any other technique for image annotation which is fast, simplest, and automatic or can I do with programming?
please help me out to solve this problem.

I used hasty.ai on a recent project. Their tool has an AI which observes you while you label. After learning on a few images only it suggests annotations which you can accept or correct (and by that retrain the model). Once the AI makes good annotations, you can batch process all your remaining images. It allowed me to annotate a huge dataset within a few days only.
You can use it for free in the beginning! I switched to pro, then, as it was worth the money and the price is reasonable.

Related

Object detection for a single object only

I have been working with object detection. But these methods consist of very deep neural networks and require lots of memory to store the trained models. E.g. I once tried to train a Mask R-CNN model, and the weights take 200 MB.
However, my focus is on detecting a single object only. So, I guess these methods are not suitable. Are there any object detection method that can do this job with a low memory requirement?
You can try SSD or faster RCNN they are easily available in Tensorflow object detection API
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
here you can get pre-trained models and config file
you can select your model by taking look on speed and mAP(accuracy) column as per your requirement.
Following mukul's answer, I specifically recommend you check out SSDLite-MobileNetV2.
It's a lite-weight model, which is still enough expressive for good results.
Especially when you're restricting yourself to a single class, as you can see in the example of FaceSSD-MobileNetV2 as in here (Note however this is vanilla SSD).
So you can simply Take the pre-trained model of SSDLite-MobileNetV2 with the corresponding config file, and modify it for a single class.
This means changing num_classes to 1, modifying the label_map.pbtxt, and of course - preparing the dataset with the single class you want.
If you want a more robust model, but which has no pre-trained mode, you can use an FPN version.
Checkout this config file, which is with MobileNetV1, and modify it for your needs (e.g. switching to MobileNetV2, switching to use_depthwise, etc).
On one hand, there's no detection pre-trained model, but on the other the detection head is shared over all (relevant) scales, so it's somewhat easier to train.
So simply fine-tune it from the corresponding classification checkpoint from here.

Clustering Category Purchases in Customer Data

I am attempting to cluster a group of customers based on spend, order frequency, order breadth and what % of purchases they make in each category (there are around 20).
It will probably be a simple answer but I cannot figure out whether I should standardize (subtract mean and divide by sd) the % category buy columns or not. When I dont standardize I can get around 90% of the variance explained in 4-5 principal components (using SVD), but when I standardize each column I only get around 40% for the same number of principal components. My worry is that because each column is related, I am removing the relationship by standardizing. At the same time I am worried that not standardizing will cause issues with the other variables in the data that I have standardized.
I would assume if others tried clustering in this way they would face a similar issue but I cant seem to find one so it might be that I just dont understand the situation. Thanks for any clarification in advance!
Chris,
Percentage scale has a well defined range and nice properties.
By heuristically scaling these features you usually make things worse.

What kind of features are extracted with the AlexNet layers?

Question is regarding this method, which extracts features from the FC7 layer of AlexNet.
What kind of features is it actually extracting?
I used this method on images of paintings done by two artists. The training set is about 150 training images from each artist (so that the features are stored in a 300x4096 matrix); the validation set is 40 images. This works really well, 85-90% correct classification. I would like to know why it works so well.
WHAT FEATURES ?
FC8 is the classification layer; FC7 is the one before it, where all of the prior kernel pixels are linearised and concatenated. These represent the abstract, top-level features that the model training has "discovered". To examine these features, try one of the many layer visualization tools available on line (don't ask for references here; SO bans requests for resources).
The features vary from one training to another, depending on the kernel initialization (usually random) and very dependent on the training set. However, the features tend to be simple in the early layers, with greater variety and detail in the later ones. For instance, on the original AlexNet target (ILSVRC 2012, a.k.a. ImageNet data set), the FC7 features often include vehicle tires, animal faces, various types of flower petals, green leaves and stems, two-legged animal torsos, airplane sections, car/truck/bus grill work, etc.
Does that help?
WHY DOES IT WORK SO WELL ?
That depends on the data set and training parameters. How different are the images from the artists? There are plenty of features to extract: choice of subject, palette, compositional complexity, hard/soft edges, even direction of brush strokes. For instance, differentiating any two early cubists could be a little tricky; telling Rembrandt from Jackson Pollack should hit 100%.

Clustering or classification?

I am stuck between a decision to apply classification or clustering on the data set I got. The more I think about it, the more I get confused. Heres what I am confronted with.
I have got news documents (around 3000 and continuously increasing) containing news about companies, investment, stocks, economy, quartly income etc. My goal is to have the news sorted in such a way that I know which news correspond to which company. e.g for the news item "Apple launches new iphone", I need to associate the company Apple with it. A particular news item/document only contains 'title' and 'description' so I have to analyze the text in order to find out which company the news referes to. It could be multiple companies too.
To solve this, I turned to Mahout.
I started with clustering. I was hoping to get 'Apple', 'Google', 'Intel' etc as top terms in my clusters and from there I would know the news in a cluster corresponds to its cluster label, but things were a bit different. I got 'investment', 'stocks', 'correspondence', 'green energy', 'terminal', 'shares', 'street', 'olympics' and lots of other terms as the top ones (which makes sense as clustering algos' look for common terms). Although there were some 'Apple' clusters but the news items associated with it were very few.I thought may be clustering is not for this kind of problem as many of the company news goes into more general clusters(investment, profit) instead of the specific company cluster(Apple).
I started reading about classification which requires training data, The name was convincing too as I actually want to 'classify' my news items into 'company names'. As I read on, I got an impression that the name classification is a bit deceiving and the technique is used more for prediction purposes as compared to classification. The other confusions that I got was how can I prepare training data for news documents? lets assume I have a list of companies that I am interested in. I write a program to produce training data for the classifier. the program will see if the news title or description contains the company name 'Apple' then its a news story about apple. Is this how I can prepare training data?(off course I read that training data is actually a set of predictors and target variables). If so, then why should I use mahout classification in the first place? I should ditch mahout and instead use this little program that I wrote for training data(which actually does the classification)
You can see how confused I am about how to address this issue. Another thing that concerns me is that if its possible to make a system this intelligent, that if the news says 'iphone sales at a record high' without using the word 'Apple', the system can classify it as a news related to apple?
Thank you in advance for pointing me in the right direction.
Copying my reply from the mailing list:
Classifiers are supervised learning algorithms, so you need to provide
a bunch of examples of positive and negative classes. In your example,
it would be fine to label a bunch of articles as "about Apple" or not,
then use feature vectors derived from TF-IDF as input, with these
labels, to train a classifier that can tell when an article is "about
Apple".
I don't think it will quite work to automatically generate the
training set by labeling according to the simple rule, that it is
about Apple if 'Apple' is in the title. Well, if you do that, then
there is no point in training a classifier. You can make a trivial
classifier that achieves 100% accuracy on your test set by just
checking if 'Apple' is in the title! Yes, you are right, this gains
you nothing.
Clearly you want to learn something subtler from the classifier, so
that an article titled "Apple juice shown to reduce risk of dementia"
isn't classified as about the company. You'd really need to feed it
hand-classified documents.
That's the bad news, but, sure you can certainly train N classifiers
for N topics this way.
Classifiers put items into a class or not. They are not the same as
regression techniques which predict a continuous value for an input.
They're related but distinct.
Clustering has the advantage of being unsupervised. You don't need
labels. However the resulting clusters are not guaranteed to match up
to your notion of article topics. You may see a cluster that has a lot
of Apple articles, some about the iPod, but also some about Samsung
and laptops in general. I don't think this is the best tool for your
problem.
First of all, you don't need Mahout. 3000 documents is close to nothing. Revisit Mahout when you hit a million. I've been processing 100.000 images on a single computer, so you really can skip the overhead of Mahout for now.
What you are trying to do sounds like classification to me. Because you have predefined classes.
A clustering algorithm is unsupervised. It will (unless you overfit the parameters) likely break Apple into "iPad/iPhone" and "Macbook". Or on the other hand, it may merge Apple and Google, as they are closely related (much more than, say, Apple and Ford).
Yes, you need training data, that reflects the structure that you want to measure. There is other structure (e.g. iPhones being not the same as Macbooks, and Google, Facebook and Apple being more similar companies than Kellogs, Ford and Apple). If you want a company level of structure, you need training data at this level of detail.

trainning neural network

I have a picture.1200*1175 pixel.I want to train a net(mlp or hopfield) to learn a specific part of it(201*111pixel) to save its weight to use in a new net(with the same previous feature)only without train it to find that specific part.now there are this questions :what kind of nets is useful;mlp or hopfield,if mlp;the number of hidden layers;the trainlm function is unuseful because "out of memory" error.I convert the picture to a binary image,is it useful?
What exactly do you need the solution to do? Find an object with an image (like "Where's Waldo"?). Will the target object always be the same size and orientation? Might it look different because of lighting changes?
If you just need to find a fixed pattern of pixels within a larger image, I suggest using a straightforward correlation measure, such as crosscorrelation to find it efficiently.
If you need to contend with any of the issues mentioned above, then there are two basic solutions: 1. Build a model using examples of the object in different poses, scalings, etc. so that the model will recognize any of them, or 2. Develop a way to normalize the patch of pixels being examined, to minimize the effect of those distortions (like Hu's invariant moments). If nothing else, yuo'll want to perform some sort of data reduction to get the number of inputs down. Technically, you could also try a model which is invariant to rotations, etc., but I don't know how well those work. I suspect that they are more tempermental than traditional approaches.
I found AdaBoost to be helpful in picking out only important bits of an image. That, and resizing the image to something very tiny (like 40x30) using a Gaussian filter will speed it up and put weight on more of an area of the photo rather than on a tiny insignificant pixel.