similarity search API of Watson Visual Recognition related to bulk volume of images - ibm-cloud

We have a customer requirement to search similar images in a collection using Watson Visual Recognition. The documentation mentions that each collection can contain 1 million images. Thus, I have the following questions:
a) What is the maximum size of the image?
b) Each image upload takes up to 1 second and the standard plan has a limit of 25000 images per day. So, can only 25k images added to the collection/day?
c) The customer has about 2 million images. How can we upload the images faster?
d) Is there a separate plan available for bulk volumes?

This information comes from the Visual Recognition documentation at the following url:
https://www.ibm.com/watson/developercloud/doc/visual-recognition/customizing.html
Size limitations
There are size limitations for training calls and data:
The service accepts a maximum of 10,000 images or 100 MB per .zip
file.
The service requires a minimum of 10 images per .zip file.
The service accepts a maximum of 256 MB per training call.
Minimum recommend size of an image is 32X32 pixels.
Guidelines for good training Anchor link
The following guidelines are not enforced by the API. However, the service tends to perform better when the training data adheres to them:
A minimum of 50 images is recommended in each .zip file, as fewer than 50 images can decrease the quality of the trained classifier.
If the quality and content of training data is the same, then classifiers that are trained on more images will generally be more accurate than classifiers that are trained on fewer images. The benefits of training a classifier on more images plateaus at around 5000 images, and this can take a while to process. You can train a classifier on more than 5000 images, but it may not significantly increase that classifier's accuracy.
Uploading a total of 150-200 images per .zip file gives you the best balance between the time it takes to train and the improvement to classifier accuracy. More than 200 images increases the time, and it does increase the accuracy, but with diminishing returns for the amount of time it takes.
Include approximately the same number of images in each examples file. Including an unequal number of images can cause the quality of the trained classifier to decline.
The accuracy of your custom classifier can be affected by the kinds of images you provide to train it. Provide example images that are similar to the images you plan to analyze. For example, if you are training the classifier "tiger", your classifier might be less accurate if you provide only images of tigers in a zoo taken by a mobile phone to train the classifier, but you want to test the classifier on images of tigers in the wild taken by professional photographers.
Guidelines for high volume classifying Anchor link
If you want to classify many images, submitting one image at a time can take a long time. You can maximize efficiency and performance of the service in the following ways:
Resize images to be no larger than 320 pixels in either width or height. Images do not need to be high resolution.
Submit images in batches as compressed (.zip) files.
Specify only the classifiers you want results for in the classifier_ids parameter. If you do not specify a value for this parameter, the service classifies the images against the default classifier and takes longer to return a response.

Ravi, I see you posted your question on developerWorks too - please see my answer here: https://developer.ibm.com/answers/questions/379227/similarity-search-api-of-watson-visual-recognition/

Related

Choice of Neural Network and Activation Function

I am very new to the field of Neural Network. Apologies, if this question is very amateurish.
I am looking to build a neural network model to predict whether a particular image that I am about to post on a social media platform will get a certain engagement rate.
I have around 120 images with historical data about the engagement rate. The following information is available:
Images of size 501 px x 501 px
Type of image (Exterior photoshoot/Interior photoshoot)
Day of posting the image (Sunday/Monday/Tuesday/Wednesday/Thursday/Friday/Saturday)
Time of posting the image (18:33, 10:13, 19:36 etc)
No. of people who have seen the post (15659, 35754, 25312 etc)
Engagement rate (5.22%, 3.12%, 2.63% etc)
I would like the model to predict if a certain image when posted on a particular day and time will give an engagement rate of 3% or more.
As you may have noticed, the input data is images, text (signifying what type or day), time and numbers.
Could you please help me understand how to build a neural network for this problem?
P.S: I am very new to this field. It would be great if you can give a detailed direction how I should proceed to solve this problem.
A neural network has three kinds of neuronal layers:
Input layer. It stores the inputs this network will receive. The number of neurons must equal the number of inputs you have;
Hidden layer. It uses the inputs that come from the previous layer and it does the necessary calculations so as to obtain a result, which passes to the output layer. More complex problems may require more than one hidden layer. As far as I know, there is not an algorithm to determine the number of neurons in this layer, so I think you determine this number based on trial and error and previous experience;
Output layer. It gets the results from the hidden layer and gives it to the user for his personal use. The number of neurons from the output layer equals the number of outputs you have.
According to what you write here, your training database has 6 inputs and one output (the engagement rate). This means that your artificial neural network (ANN) will have 6 neurons on the input layer and one neuron on the output layer.
I not sure if you can pass images as inputs to a neural network. Also, because in theory there are an infinite types of images, I think you should categorize them a bit, each category receiving a number. An example of categorization would be:
Images with dogs are in category 1;
Images with hospitals are in category 2, etc.
So, your inputs will look like this:
Image category (dogs=1, hospitals=2, etc.);
Type of image (Exterior photoshoot=1, interior photoshoot=2);
Posting day (Sunday=1, Monday=2, etc.);
Time of posting the image;
Number of people who have seen the post;
Engagement rate.
The number of hidden layers and the number of each neuron from each hidden layer depends on your problem's complexity. Having 120 pictures, I think one hidden layer and 10 neurons on this layer is enough.
The ANN will have one hidden layer (the engagement rate).
Once the database containing the information about the 120 pictures is created (known as training database) is created, the next step is to train the ANN using the database. However, there is some discussion here.
Training an ANN means computing some parameters of the hidden neurons by using an optimization algorithm so as the sum of squared errors is minimum. The training process has some degree of randomness to it. To minimize the effect of the randomness factor and to get as precise estimations as possible, your training database must have:
Consistent data;
Many records;
I don't know how consistent your data are, but from my experience, a small training database with consistent data beats a huge database with non-consistent ones.
Judging by the problem, I think you should use the default activation function provided by the software you use for ANN handling.
Once you have trained your database, it is time to see how efficient this training was. The software which you use for ANN should provide you with tools to estimate this, tools which should be documented. If training is satisfactory for you, you may begin using it. If it is not, you may either re-train the ANN or use a larger database.

How can batch include both positive and negative labels?

I have pedestrian classification problem which I want to solve with VGG-16. In order to do that, I prepared train and test sets. My train set has 2038 images and my test set has 252 images. My batch size is 64. How can I tell Keras that I want these 64 images to include both positive and negative labels while training? I don't want it to learn only on positive or only on negative labels.
If you shuffle your training data (which is strongly recommended), there is a very small chance to get all the 64 samples belonging to the same class. For the two-class task there should be no problem with the data.
However, if you wish to guarantee balanced training batches, you can use third-party code like BalancedBatchGenerator. See this tutorial for further information.

IBM watson image recognition : time taken for training

How much time does it take to train a classifier in Watson? I uploaded around 500 images and it has been 48 hours and the module is still in training.
I am trying to differentiate plant leaves and thus gave images of plant leaves. Total file size is around 50MB.
Training a visual classifier can take some time, due to the upload speeds most people have and the size of the images being used to train the classifier. Think about how long it would take to transfer the data from the environment that you are working in, to a data center - and that is the absolute quickest that your training will be.
With that being said, I can't imagine that the training would take 24 hours. With 50MB of data, and 7-9 classes, training should take no longer than a hour at the very most.
Please try retraining the images and there might be some error.
It happened with me many times.
So cancel it and train it again.

Is 50 Images a minimum requirement to recognize a face of a person using Watson visual recognitionservice?

I have question on Watson Visual recognition Service of bluemix?
Is 50 Images a minimum requirement to recognize a face of a person?
What would happen if we train with less than 50 images? What would be the consistency of the output in terms of facial recognition?
Requirement is, Retrieve the employee id of an employee by his facial(visual) recognition.
Is it achievable with Watson visual recognition Service?
In real time, it may be little hard to have 50 images of an employee or a person.
?
Thanks,
Priyanka
When I use the Visual Recognition, I had the same doubt, after it, I search this article talking about good practices:
The accuracy you will see from your custom classifier depends directly on the quality of the training you perform.
On a basic level, images in training and testing sets should resemble each other. Significant visual differences between training and testing groups will result in poor performance results.
There are a number of additional factors that will impact the quality of your training beyond the resolution of your images. Lighting, angle, focus, color, shape, distance from subject, and presence of other objects in the image will all impact your training. Please note that Watson takes a holistic approach when being trained on each image. While it will evaluate all of the elements listed above, it cannot be tasked to exclusively consider a specific element.
So, the service works by using a collection of classifiers, each classifier is a single tag only and must be trained with it's own sets of positive and negative images. So, the professional recommend using a significantly greater amount of images to improve the performance and accuracy of your classifier such as 100s or 1000s of images.
See one video to verify how it works.
Fork the Example in the video on Github.
Official Documentation about Guidelines for training classifiers.

Convolution Neural Network for image detection/classification

So here is there setup, I have a set of images (labeled train and test) and I want to train a conv net that tells me whether or not a specific object is within this image.
To do this, I followed the tensorflow tutorial on MNIST, and I train a simple conv net reduced to the area of interest (the object) which are training on image of size 128x128. The architecture is as follows : successively 3 layers consisting of 2 conv layers and 1 max pool down-sampling layers, and one fully connected softmax layers (with two class 0 and 1 whether the object is present or not)
I impleted it using tensorflow, and this works quite well, but since I have enough computing power I was wondering how I could improve the complexity of the classification:
- adding more layers ?
- adding more channel at each layer ? (currently 32,64,128 and 1024 for the fully connected)
- anything else ?
But the most important part is that now I want to detect this same object on larger images (roughle 600x600 whereas the size of the object should be around 100x100).
I was wondering how I could use the previously training "small" network used for small images, in order to pretrained a larger network on the large images ? One option could be to classify the image using a slicing window of size 128x128 and scan the whole image but I would like to try if possible to train a whole network on it.
Any suggestion on how to proceed ? Or an article / ressource tackling this kind of problem ? (I am really new to deep learning so sorry if this is stupid question...)
Thanks !
I suggest that you continue reading on the field overall. Your search keys include CNN, image classification, neural net, AlexNet, GoogleNet, and ResNet. This will return many articles, on-line classes and lectures, and other materials to help you learn about classification with neural nets.
Don't just add layers or filters: the complexity of the topology (net design) must be fitted to the task; a net that's too complex will over-fit the training data. The one you've been using is probably LeNet; the three I cite above are for the ImageNet image classification contest.
Since you are working on images, I would suggest you to use a pretrained image classification network (like VGG, Alexnet etc.)and fine tune this network with your 128x128 image data. In my experience until we have very large data set fine tuned network will give more accuracy and also save training time. After building a good image classifier on your data set you can use any popular algorithm to generate region of proposal from the image. Now take all regions of proposal and pass them to classification network one by one and check weather this network is classifying given region of proposal as positive or negative. If it classifying as positively then most probably your object is present in that region. Otherwise it's not. If there are a lot of region of proposal in which object is present according to classifier then you can use non maximal suppression algorithms to reduce number of positive proposals.