Flickr API: What are the "vision" tags? - tags

When querying the Flickr API and checking for the returned tags, I noticed that I receive additional tags which are not shown on the web interface. For example for this image:
http://www.flickr.com/photos/77060598#N08/12078886973
Beside the tags shown on the webpage (Nikon F2AS, Nikon, Black and White, B&W, Mountains, Germany, Snow, Landscape, Sky, Clouds), the JSON response contains the tags vision:outdoor=0949 and vision:sky=051.
I assume, that some computer vision processing is applied by Flickr to automatically assign those tags. Am I right with this assumption? I cannot find any documentation about those tags. Is there any description about the algorithms they employ and/or the kind of tags and the meaning of the numbers they assign?

Yes, your assumption is right. These tags are image classification tags.
They are part of an ongoing research in the area of classification and computational learning.
The research goal is to reach a precise category based image classification with a minimal learning effort.
yahoo large scale flickr tag image classification challenge
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J. and Zisserman, A. The PASCAL Visual Object Classes (VOC) Challenge International Journal of Computer Vision, 88(2), 303-338, 2010 - PDF
http://pascallin.ecs.soton.ac.uk/challenges/VOC/
Training and Test Data
Results & Leaderboard

Related

Why such a bad performance for Moses using Europarl?

I have started playing around with Moses and tried to make what I believe would be a fairly standard baseline system. I have basically followed the steps described on the website, but instead of using news-commentary I have used Europarl v7 for training, with the WMT 2006 development set and the original Europarl common test. My idea was to do something similar to Le Nagard & Koehn (2010), who obtained a BLEU score of .68 in their baseline English-to-French system.
To summarise, my workflow was more or less this:
tokenizer.perl on everything
lowercase.perl (instead of truecase)
clean-corpus-n.perl
Train IRSTLM model using only French data from Europarl v7
train-model.perl exactly as described
mert-moses.pl using WMT 2006 dev
Testing and measuring performances as described
And the resulting BLEU score is .26... This leads me to two questions:
Is this a typical BLEU score for this kind of baseline system? I realise Europarl is a pretty small corpus to train a monolingual language model on, even though this is how they do things on the Moses website.
Are there any typical pitfalls for someone just starting with SMT and/or Moses I may have fallen in? Or do researchers like Le Nagard & Koehn build their baseline systems in a way different from what is described on the Moses website, for instance using some larger, undisclosed corpus to train the language model?
Just to put things straight first: the .68 you are referring to has nothing to do with BLEU.
My idea was to do something similar to Le Nagard & Koehn (2010), who obtained a BLEU score of .68 in their baseline English-to-French system.
The article you refer to only states that 68% of the pronouns (using co-reference resolution) was translated correctly. It nowhere mentions that a .68 BLEU score was obtained. As a matter of fact, no scores were given, probably because the qualitative improvement the paper proposes cannot be measured with statistical significance (which happens a lot if you only improve on a small number of words). For this reason, the paper uses a manual evaluation of the pronouns only:
A better evaluation metric is the number of correctly
translated pronouns. This requires manual
inspection of the translation results.
This is where the .68 comes into play.
Now to answer your questions with respect to the .26 you got:
Is this a typical BLEU score for this kind of baseline system? I realise Europarl is a pretty small corpus to train a monolingual language model on, even though this is how they do things on the Moses website.
Yes it is. You can find the performance of WMT language pairs here http://matrix.statmt.org/
Are there any typical pitfalls for someone just starting with SMT and/or Moses I may have fallen in? Or do researchers like Le Nagard & Koehn build their baseline systems in a way different from what is described on the Moses website, for instance using some larger, undisclosed corpus to train the language model?
I assume that you trained your system correctly. With respect to the "undisclosed corpus" question: members of the academic community normally state for each experiment which data sets were used for training testing and tuning, at least in peer-reviewed publications. The only exception is the WMT task (see for example http://www.statmt.org/wmt14/translation-task.html) where privately owned corpora may be used if the system participates in the unconstrained track. But even then, people will mention that they used additional data.

Clearing Mesh of Graph

If we do the information visualization of documents, the graph generation across multiple documents often forms a mesh. Now to get a clear picture it is easy to form them with minimum data load and thus summarization is a good thing. But if the document load becomes
million then with summarization also the graph forms a big mesh.
I am bit perplexed how to clear the mesh. Reading and working round http://www.jerrytalton.net/research/Talton04SSMSA.report/Talton04SSMSA.pdf is not coming much help, as data is huge.
If any learned members may kindly help me out.
Regards,
SK
Are you talking about creating a graph or network of the documents? For example, you could have a network of documents linked by their citations, by having shared authors, by having the same terms appearing in them, etc. This isn't generally called a mesh problem, instead it is an automatic graph layout problem.
You need either better layout algorithms or to do some kind of clustering and reduction. There are many clustering algorithms you can use, for example Wakita & Tsurumi's:
Ken Wakita and Toshiyuki Tsurumi. 2007. Finding community structure in mega-scale social networks: [extended abstract]. Proc. 16th international conference on World Wide Web (WWW '07). 1275-1276. DOI=10.1145/1242572.1242805.
One that is particularly targeted at reducing complexity through "graph summarization" is Navlakha et al. 2008:
Saket Navlakha, Rajeev Rastogi, and Nisheeth Shrivastava. 2008. Graph summarization with bounded error. Proc. 2008 ACM SIGMOD international conference on Management of data (SIGMOD '08). 419-432. DOI=10.1145/1376616.1376661.
You could also check out my latest paper, which replaces common repeating patterns in the network with representative glyphs:
Dunne, C. & Shneiderman, B. 2013. Motif simplification: improving network visualization readability with fan, connector, and clique glyphs. Proc. 2013 SIGCHI Conference on Human Factors in Computing Systems (CHI '13). PDF.
Here's an example picture of the reduction possible:

Clustering or classification?

I am stuck between a decision to apply classification or clustering on the data set I got. The more I think about it, the more I get confused. Heres what I am confronted with.
I have got news documents (around 3000 and continuously increasing) containing news about companies, investment, stocks, economy, quartly income etc. My goal is to have the news sorted in such a way that I know which news correspond to which company. e.g for the news item "Apple launches new iphone", I need to associate the company Apple with it. A particular news item/document only contains 'title' and 'description' so I have to analyze the text in order to find out which company the news referes to. It could be multiple companies too.
To solve this, I turned to Mahout.
I started with clustering. I was hoping to get 'Apple', 'Google', 'Intel' etc as top terms in my clusters and from there I would know the news in a cluster corresponds to its cluster label, but things were a bit different. I got 'investment', 'stocks', 'correspondence', 'green energy', 'terminal', 'shares', 'street', 'olympics' and lots of other terms as the top ones (which makes sense as clustering algos' look for common terms). Although there were some 'Apple' clusters but the news items associated with it were very few.I thought may be clustering is not for this kind of problem as many of the company news goes into more general clusters(investment, profit) instead of the specific company cluster(Apple).
I started reading about classification which requires training data, The name was convincing too as I actually want to 'classify' my news items into 'company names'. As I read on, I got an impression that the name classification is a bit deceiving and the technique is used more for prediction purposes as compared to classification. The other confusions that I got was how can I prepare training data for news documents? lets assume I have a list of companies that I am interested in. I write a program to produce training data for the classifier. the program will see if the news title or description contains the company name 'Apple' then its a news story about apple. Is this how I can prepare training data?(off course I read that training data is actually a set of predictors and target variables). If so, then why should I use mahout classification in the first place? I should ditch mahout and instead use this little program that I wrote for training data(which actually does the classification)
You can see how confused I am about how to address this issue. Another thing that concerns me is that if its possible to make a system this intelligent, that if the news says 'iphone sales at a record high' without using the word 'Apple', the system can classify it as a news related to apple?
Thank you in advance for pointing me in the right direction.
Copying my reply from the mailing list:
Classifiers are supervised learning algorithms, so you need to provide
a bunch of examples of positive and negative classes. In your example,
it would be fine to label a bunch of articles as "about Apple" or not,
then use feature vectors derived from TF-IDF as input, with these
labels, to train a classifier that can tell when an article is "about
Apple".
I don't think it will quite work to automatically generate the
training set by labeling according to the simple rule, that it is
about Apple if 'Apple' is in the title. Well, if you do that, then
there is no point in training a classifier. You can make a trivial
classifier that achieves 100% accuracy on your test set by just
checking if 'Apple' is in the title! Yes, you are right, this gains
you nothing.
Clearly you want to learn something subtler from the classifier, so
that an article titled "Apple juice shown to reduce risk of dementia"
isn't classified as about the company. You'd really need to feed it
hand-classified documents.
That's the bad news, but, sure you can certainly train N classifiers
for N topics this way.
Classifiers put items into a class or not. They are not the same as
regression techniques which predict a continuous value for an input.
They're related but distinct.
Clustering has the advantage of being unsupervised. You don't need
labels. However the resulting clusters are not guaranteed to match up
to your notion of article topics. You may see a cluster that has a lot
of Apple articles, some about the iPod, but also some about Samsung
and laptops in general. I don't think this is the best tool for your
problem.
First of all, you don't need Mahout. 3000 documents is close to nothing. Revisit Mahout when you hit a million. I've been processing 100.000 images on a single computer, so you really can skip the overhead of Mahout for now.
What you are trying to do sounds like classification to me. Because you have predefined classes.
A clustering algorithm is unsupervised. It will (unless you overfit the parameters) likely break Apple into "iPad/iPhone" and "Macbook". Or on the other hand, it may merge Apple and Google, as they are closely related (much more than, say, Apple and Ford).
Yes, you need training data, that reflects the structure that you want to measure. There is other structure (e.g. iPhones being not the same as Macbooks, and Google, Facebook and Apple being more similar companies than Kellogs, Ford and Apple). If you want a company level of structure, you need training data at this level of detail.

What is the relation between OCR and Artificial Neural Network?

I saw different articles speaking about OCR form recognition (data extraction) and they said that they used Neural Network in order to do form recognition, so what's the relation between Artificial Neural network (ANN) and form recognition? If I want to extract fields from a BusinessCard, is it required to use ANN or is it optional? In other words when do I need to use ANN and when I don't?
It's a little different. ANN is just an "expert" in all OCR. But OCR engines contain many experts. When you study ANN you will build a simple OCR engine using just ANN but this does not compare to modern engines that use this in conjunction with tri-grams, morphology, data types ( very important for BCR and Forms ), dictionaries, connected components algorithm, etc. So look at it as just one of the tools in the bag of tricks to extract quality results. A good engine will incorporate ANN and all the others. In BCR there are additional considerations and it should be very heavy on connected components, dictionaries first, then use ANN and pattern matching for the actually recognition.
ANN is one way to perform OCR. There are others. Hence if you want to extract fields from a BusinessCard using ANN is only optional.
Good question. I recently spent some time playing with OCRopus, a Google project that does OCR - you can get it for free and play with it yourself. I'm pretty sure that it has an ANN as one of the modules behind it. However, the whole process of Optical Character Recognition can have many steps (lots of different little modules that each do something and pass the results to the next module).
So, here are some of the things I remember as being done by modules in that project:
There was a module that turned the image into black and white - this makes it easier for later modules to deal with.
Getting rid of speckles / spackles.
Straightening out the lines of text.
Breaking lines of text into individual words (it's been a few weeks, not sure about this one)
Basically, you can do the above using little bits of code that don't involve a neural net. So it's simpler doing it with these little bits of code.
The neural net I think is used just to recognize the individual characters - which character of a group of possible characters is it.
There's a training command in the OCRopus that I had running for over a week on end, and it kept sending line samples to the map, slowly changing the map as it went. I think it was training the ANN part.

Where can I download tagged word dictionary and rule?

I am learning to tag part of speech by applying transformational rules. The first step is to tag the possible POS to each word in a text by using a dictionary like:
communicative JJ
communicator NN
communicators NNS
communion NN
communique NN
communiques NNS
communism NN
The second step is to apply transformational rules to change tags. I have only a very small dictionary containing the above word/tag pairs. Where can I find a large one and where can I find transformational rules? It is said tagging based on transformation may have a lot of rules. Where can I find the rules?
Thank you in advance.
You'd obtain the possibilities from a corpus, such as those available in NLTK. That would also give you frequencies from which to estimate probabilities, if you want to do machine-learned tagging (Brill-style).
The rules must be handcrafted, after which the machine learner can find out when to apply which ones. See, e.g., Brill's PhD thesis for English rules.