Recommendation engine using google-prediction-api? - recommendation-engine

In google's prediction api page, it says we can use it for recommendation of webpages / products...
Can someone please show me how, for example:
I have 500,000 members purchased history
I have 2,000,000 products in 200 different categories
I have user-X just signup, asked him 15 'like' / 'dislike' product questions (user's taste)
Now, i want to suggest/recommend user-X with a list(e.g. 500) of products which he most likely willing to purchase
Thanks a lot

If you are not specifically tied to Google API fow whatever reason, explore using Mahout. This is a basic use case for the Mahout Recommendation mining.
https://cwiki.apache.org/MAHOUT/itembased-collaborative-filtering.html

The Google Prediction API, as currently implemented, is great for classifying data into a discrete set of categories, however, as noted in the documentation:
Avoid having a high ratio of categories to training data in categorical models.
Try to have at least a few dozen examples for each category, minimum.
For really good predictions, a few hundred examples per category is
recommended.
The Prediction API's classification doesn't work well when the ratio of categories to examples is high and in the example you sketched the relationship is one-to-one because you are trying to find the user whose liked product list is most similar to the user of interest (to find a set of promising products to recommend). In this model, each user is a unique category.

Related

aws-presonalize: can I get recommendations on items not seen in training based on item features?

I consider using aws personalize, or any similar managed recommendation service.
My question is whether it is possible to get recommendations/rankings on items that were not seen in the training data, based on item features. I see that aws personalize does have item feature dataset, but when I read the documentation about ranking recipe it specifically says that items not in the training are added at the end of any ranking. of course - new items have no interaction data, so any recipe/algorithm that solely relies on interaction data is not relevant for my case.
My question is, whether and how can I utilize aws personalize to my use case, if at all possible, or whether you know of any recommender service that can handle it.
Yes. There are specific Amazon Personalize recipes designed to support cold starting items where a cold item is one without behavioral data in the interactions dataset but with item metadata in the items dataset.
The User-Personalization recipe supports cold starting items through a feature called exploration. You control how much exploration (i.e., recommending cold items) is done with the explorationWeight inference hyperparameter when creating a Personalize campaign or batch inference job. See this blog post for details.
Exploration also applies to domain recommenders for the Top picks for you VOD recommender and Recommended for you e-commerce recommender. You specify the explorationWeight when creating a recommender.
The Similar-Items recipe supports the related items use case and looks to balance recommending similar items based on behavioral data and thematic similarity between items. You currently cannot control the weighting with this recipe, though. See this blog post for details. The More like X VOD recommender provides similar functionality.

Does Facebook's personal ranking algorithm leak external profile data?

I recently came across this script that extracts friend rank data from the currently logged-in Facebook profile and presents it as a table.
After trying the script personally, I became puzzled as to why certain individuals were consistently ranked higher than others. The rank seems to refresh daily, so I have experimented with various user interaction, and this shifts many entries appropriately; however, the same 'certain individual(s)' would often (with no discernible interaction) arbitrarily move up in rank.
My question is this: is it possible that this rank is being affected by other, external profile's usage data/habits?
In the interests of privacy, it seems very unlikely that anything but personal habits would influence this ranking, but my own and other peoples' usage anecdotes seem to suggest 'arbitrary' movement that would only be explained by external data.
I cannot seem to find a definitive answer to this elsewhere.
Any input would be greatly appreciated.

How to apply collaborative filtering on no-rating system like Twitter, Facebook

I'm studying Collaborative Filtering and want to apply to some social network like Twitter or Facebook. I tried with some demo provided by MovieLens and understood that user has to rate on some items which reflect the interesting, and the rating will be used as input for recommend algorithms. However with some social network which there are no rating feature like Twitter or Facebook, how can I apply these algorithms.
Someone worked on this area, please give me suggestions for that.
The keywords you should use in search are "implicit feedback". Luckily there are some good systems/approaches out there that allow you to work with such type of data.
Here is the one I consider the best https://github.com/benfred/implicit And what's even better this GitHub page provides you with links to the articles explaining the theory behind each of the approaches it uses. There are also a couple of tutorials that would help you to write your first recommender system in no time. And it's incredibly fast, took me 2 hours on quad-core PC to calculate recommendations for 600K users basing on 40M entries.
Instead of using explicit ratings. You can infer implicit ratings by defining your own weights for actions like:
Twitter: Reteweet=1, Save=2, Both=3
Facebook: Like=1, Share=2, Both=3
Using this method, you maintained a 1-3 rating system that can be fed into the collaborative-filtering algorithm.

Classifying items in more than one category

I am developing a news classification system where a particular news item is assigned to an organization or company name. For instance a news item labelled "Apple to launch new iPhone in september 2012" gets categorized in "Apple" news.
So far, after training the classifier with a bunch of topics such as Apple news, Google news, Microsoft news, Samsung news, Bank of America news etc worked perfect and I was getting almost 99% correctly classified instances from a single trained model.
Now the problem is to classify a news such as "Samsung and Google prep attack against Apple" into three topics, "Apple", "Samsung" and "Google".
My question over here is how can I use Mahouts classification to classify a single item into multiple classes. I saw a similar question in this thread http://mail-archives.apache.org/mod_mbox/mahout-user/201206.mbox/%3C20120607223156.GA26283#opus.istwok.net%3E.
Ted Dunning gave an interesting answer as to make seperate category for multiple topics, but in my case the combinations are many. I have to classify news into almost 15,000 companies and realistically speaking any news can be a mixture of any of the 15000 companies. So the making of combinations as a separate category is ruled out!.
A second suggestion was to arrange topics in a hierarchy which also does not apply over here as the company names doesn't converge to any base category.
Having 15000 models for 15000 topics would do it, but does not sound very plausible too!
So what should be the correct way for classifiying multi topic news then?
Thanks!
If you are confronted with the problem of multi labeling your data, it is better to use a tool that is meant specifically for it. Currently mahout doesn't support multi labeling (there are some ways to do it but they are like work arounds). Here are a few tools to multi label your data
http://mulan.sourceforge.net/
http://meka.sourceforge.net/

Geolocation APIs: SimpleGeo vs CityGrid vs PublicEarth vs Twitter vs Foursquare vs Loopt vs Fwix. How to retrieve venue/location information?

We need to display meta information (e.g, address, name) on our site for various venues like bars, restaurants, and theaters.
Ideally, users would type in the name of a venue, along with zip code, and we present the closest matches.
Which APIs have people used for similar geolocation purposes? What are the pros and cons of each?
Our basic research yielded a few options (listed in title and below). We're curious to hear how others have deployed these APIs and which ones are ultimately in use.
Fwix API: http://developers.fwix.com/
Zumigo
Does Facebook plan on offering a Places API eventually that could accomplish this?
Thanks!
Facebook Places is based on Factual. You can use Factual's API which is pretty good (and still free, I think?)
http://www.factual.com/topic/local
You can also use unauthenticated Foursquare as a straight places database. The data is of uneven quality since it's crowdsourced, but I find it generally good. It's free to a certain API limit, but I think the paid tier is negotiated.
https://developer.foursquare.com/
I briefly looked at Google Places but didn't like it because of all the restrictions on how you have to display results (Google wants their ad revenue).
It's been a long time since this question was asked but a quick update on answers for other people.
This post, right now at least, will not go into great detail about each service but merely lists them:
http://wiki.developer.factual.com/w/page/12298852/start
http://developer.yp.com
http://www.yelp.com/developers/documentation
https://developer.foursquare.com/
http://code.google.com/apis/maps/documentation/places/
http://developers.facebook.com/docs/reference/api/
https://simplegeo.com/docs/api-endpoints/simplegeo-context
http://www.citygridmedia.com/developer/
http://fwix.com/developer_tools
http://localeze.com/
They each have their pros and cons (i.e. Google Places only allows 20 results per query, Foursquare and Facebook Places have semi-unreliable results) which can be explained a bit more in detail, although not entirely, in the following link. http://www.quora.com/What-are-the-pros-and-cons-of-each-Places-API
For my own project I ended up deciding to go with Factual's API since there are no restrictions on what you do with the data (one of the only ToS' that I've read in its entirety). Factual has a pretty reliable API, which as a user of the API you may update, modify, or flag rows of the data. Facebook Places bases their data on Factual's, just another fact to shed some perspective.
Hope I can be of help to any future searchers.
This is not a complete answer, because I havn't compared the given geolocation API, but there is also the Google Places API, which solves a similiar problem like the other APIs.
One thing about SimpleGeo: The Location API of SimpleGeo supports mainly US (and Canada?) based locations. The last time I checked, my home country Germany doesn't has many known locations.
Comparison between places data APIs is tough to keep up to date, with the fast past of the space, and with acquisitions like SimpleGeo and HyperPublic changing the landscape quickly.
So I'll just throw in CityGrids perspective as of February 2012. CityGrid provides 18M US places, allowing up to 10M requests per month for developers (publishers) at no charge.
You can search using a wide range of "what" and "where" (Cities, Neighborhoods, Zip Codes, Metro Areas, Addresses, Intersections) searches including latlong. We have rich data for each place including images, videos, reviews, offers, etc.
CityGrid also has a developer revenue sharing program where we'll pay you to display some places as well as large mobile and web advertising network.
You can also query Places via the CityGrid API using Factual, Foursquare and other places providers places and venue IDs. We aggregate data from several places data providers through our system.
Website: http://developer.citygridmedia.com/