"Who Bought This Item Also Bought" type of recommendation with matrix factorization - collaborative-filtering

I know that it is possible to do "Who Bought This Item Also Bought" type of recommendation using item-based collaborative filtering. My question is how we can do this using matrix factorization (MF). One possible solution might be to learn item features with MF and then calculate the similarities of items based on these features. But this is not "pure MF", I mean, in the end I still need to calculate the similarities between all pairs of item features which takes $O(n^2)$ time. Any idea?
Thanks

This is not the answer to your question instead another question for you. You mentioned "it is possible to do "Who Bought This Item Also Bought" type of recommendation using item-based collaborative filtering", can you recommend me some source links?

Related

Multiple object tracking using radar data and extended kalman filter

thanks in advance.
I am new to the multiple object tracking field. So, I have been working on this for a couple of days. I have developed my first version of a single object tracker using an extended Kalman filter. I am estimating position, velocity by assuming a constant acceleration model. Now my question is how can I convert the existing model for multiple objects tracking. The main problem is I am using radar data. So, I am not able to get the references for developing the tracker. So, One good example or steps to achieve can help me in understanding the concept.
The answer to this question depends on a lot of things. For example, how much control and knowledge do you have over the whole system? If you know how many targets you need to track you can add all of them to the Kalman Filter state and for every measurement you perform data association to find out to which object a given measurement belongs. An easy association metric would be nearest neighbor.
If you don't know how many targets there will be you will want to implement a track management where each target you are tracking represents a track and you can model birth and death probabilities of targets.
Multi Target Tracking is a vast field and if you want to have an in-depth mathematical introduction I would recommend the 2015 survey paper "Multitarget Tracking" by Ba-Ngu Vo et al. You should be able to find a preprint pdf online.
If you are looking more for a lightweight tutorial I would assume it should be possible to find some tutorial or example code online where to start. As mentioned in the first paragraph, nearest neighbor association for a fixed amount of objects might be a good first step.

Recommendation Algorithm for suggesting job to workers(Crowdsourcing platform)

I have crawled MTurk website. and I have 260 Hits as a dataset and from this dataset particular number of users has selected Hits and assigned ratings to each selected Hits. now I want to give recommendation to these users on basis of their selection. How it is possible ? Can anyone recommend me any recommendation algorithm ?
It sounds that You should go for the one of the Collaborative Filtering (CF) algorithm as users have explicit feedback in a form of ratings. First, I would suggest implementing a simple item/user-based k-Nearest Neighbours algorithm. If the results do not satisfy You and maybe Your data is very sparse - probably matrix factorization techniques should do the trick. A good recently survey which I read was [1] - it presents the different methods on different data settings.
If You fill fill comfortable with this and You realize that what You need is actually ranked list of Top-N predictions than ratings, I would suggest reading about e.g. Bayesian Personalized Ranking[2].
And the best part is - those algorithms are really well known and most of them are available for almost every programming language, e.g. python -> https://github.com/Mendeley/mrec/
[1] J. Lee, M. Sun, and G. Lebanon, “A Comparative Study of Collaborative Filtering Algorithms,” ArXiv, pp. 1–27, 2012.
[2] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-thieme, “BPR : Bayesian Personalized Ranking from Implicit Feedback,” in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, 2009, vol. cs.LG, pp. 452–461.

Matrix factorization based recommendation for like/dislike/unknown data

Most literature focus on either explicit rating data or implicit (like/unknown) data. Are there any good publications to handle like/dislike/unknown data? That is, in the data matrix there are three values, and I'd like to recommend from unknown entries.
And are there any good open source implementations on this?
Thanks.
With like and dislike, you already have explicit rating data. You can use standard collaborative filtering with user and item normalization. You can also check out OrdRec: An Ordinal Model for Predicting Personalized Item Rating Distributions, which just takes an ordinal ranking of item ratings. That is, you can say that Like is better than Dislike, and let the algorithm figure out the best ranking-to-rating mapping before doing standard item-item collaborative filtering. Download LensKit and use the included OrdRec algorithm.

How to generate recommendation with matrix factorization

I've read some papers of Matrix Factorization(Latent Factor Model) in Recommendation System,and I can implement the algorithm.I can get the similar RMSE result like the paper said on the MovieLens dataset.
However I find out that,if I try to generate a top-K(e.g K=10) recommended movies list for every user by rank the predicted rating,it seems that the movies that are thought to be rated high point of all users are the same.
Is that just what it works or I've got something wrong?
This is a known problem in recommendation.
It is sometimes called "Harry Potter" effect - (almost) everybody likes Harry Potter.
So most automated procedures will find out which items are generally popular, and recommend those to the users.
You can either filter out very popular items, or multiply the predicted rating by a factor that is lower the more globally popular an item is.

Clustering or classification?

I am stuck between a decision to apply classification or clustering on the data set I got. The more I think about it, the more I get confused. Heres what I am confronted with.
I have got news documents (around 3000 and continuously increasing) containing news about companies, investment, stocks, economy, quartly income etc. My goal is to have the news sorted in such a way that I know which news correspond to which company. e.g for the news item "Apple launches new iphone", I need to associate the company Apple with it. A particular news item/document only contains 'title' and 'description' so I have to analyze the text in order to find out which company the news referes to. It could be multiple companies too.
To solve this, I turned to Mahout.
I started with clustering. I was hoping to get 'Apple', 'Google', 'Intel' etc as top terms in my clusters and from there I would know the news in a cluster corresponds to its cluster label, but things were a bit different. I got 'investment', 'stocks', 'correspondence', 'green energy', 'terminal', 'shares', 'street', 'olympics' and lots of other terms as the top ones (which makes sense as clustering algos' look for common terms). Although there were some 'Apple' clusters but the news items associated with it were very few.I thought may be clustering is not for this kind of problem as many of the company news goes into more general clusters(investment, profit) instead of the specific company cluster(Apple).
I started reading about classification which requires training data, The name was convincing too as I actually want to 'classify' my news items into 'company names'. As I read on, I got an impression that the name classification is a bit deceiving and the technique is used more for prediction purposes as compared to classification. The other confusions that I got was how can I prepare training data for news documents? lets assume I have a list of companies that I am interested in. I write a program to produce training data for the classifier. the program will see if the news title or description contains the company name 'Apple' then its a news story about apple. Is this how I can prepare training data?(off course I read that training data is actually a set of predictors and target variables). If so, then why should I use mahout classification in the first place? I should ditch mahout and instead use this little program that I wrote for training data(which actually does the classification)
You can see how confused I am about how to address this issue. Another thing that concerns me is that if its possible to make a system this intelligent, that if the news says 'iphone sales at a record high' without using the word 'Apple', the system can classify it as a news related to apple?
Thank you in advance for pointing me in the right direction.
Copying my reply from the mailing list:
Classifiers are supervised learning algorithms, so you need to provide
a bunch of examples of positive and negative classes. In your example,
it would be fine to label a bunch of articles as "about Apple" or not,
then use feature vectors derived from TF-IDF as input, with these
labels, to train a classifier that can tell when an article is "about
Apple".
I don't think it will quite work to automatically generate the
training set by labeling according to the simple rule, that it is
about Apple if 'Apple' is in the title. Well, if you do that, then
there is no point in training a classifier. You can make a trivial
classifier that achieves 100% accuracy on your test set by just
checking if 'Apple' is in the title! Yes, you are right, this gains
you nothing.
Clearly you want to learn something subtler from the classifier, so
that an article titled "Apple juice shown to reduce risk of dementia"
isn't classified as about the company. You'd really need to feed it
hand-classified documents.
That's the bad news, but, sure you can certainly train N classifiers
for N topics this way.
Classifiers put items into a class or not. They are not the same as
regression techniques which predict a continuous value for an input.
They're related but distinct.
Clustering has the advantage of being unsupervised. You don't need
labels. However the resulting clusters are not guaranteed to match up
to your notion of article topics. You may see a cluster that has a lot
of Apple articles, some about the iPod, but also some about Samsung
and laptops in general. I don't think this is the best tool for your
problem.
First of all, you don't need Mahout. 3000 documents is close to nothing. Revisit Mahout when you hit a million. I've been processing 100.000 images on a single computer, so you really can skip the overhead of Mahout for now.
What you are trying to do sounds like classification to me. Because you have predefined classes.
A clustering algorithm is unsupervised. It will (unless you overfit the parameters) likely break Apple into "iPad/iPhone" and "Macbook". Or on the other hand, it may merge Apple and Google, as they are closely related (much more than, say, Apple and Ford).
Yes, you need training data, that reflects the structure that you want to measure. There is other structure (e.g. iPhones being not the same as Macbooks, and Google, Facebook and Apple being more similar companies than Kellogs, Ford and Apple). If you want a company level of structure, you need training data at this level of detail.