How to apply collaborative filtering on no-rating system like Twitter, Facebook - facebook

I'm studying Collaborative Filtering and want to apply to some social network like Twitter or Facebook. I tried with some demo provided by MovieLens and understood that user has to rate on some items which reflect the interesting, and the rating will be used as input for recommend algorithms. However with some social network which there are no rating feature like Twitter or Facebook, how can I apply these algorithms.
Someone worked on this area, please give me suggestions for that.

The keywords you should use in search are "implicit feedback". Luckily there are some good systems/approaches out there that allow you to work with such type of data.
Here is the one I consider the best https://github.com/benfred/implicit And what's even better this GitHub page provides you with links to the articles explaining the theory behind each of the approaches it uses. There are also a couple of tutorials that would help you to write your first recommender system in no time. And it's incredibly fast, took me 2 hours on quad-core PC to calculate recommendations for 600K users basing on 40M entries.

Instead of using explicit ratings. You can infer implicit ratings by defining your own weights for actions like:
Twitter: Reteweet=1, Save=2, Both=3
Facebook: Like=1, Share=2, Both=3
Using this method, you maintained a 1-3 rating system that can be fed into the collaborative-filtering algorithm.

Related

Does Facebook's personal ranking algorithm leak external profile data?

I recently came across this script that extracts friend rank data from the currently logged-in Facebook profile and presents it as a table.
After trying the script personally, I became puzzled as to why certain individuals were consistently ranked higher than others. The rank seems to refresh daily, so I have experimented with various user interaction, and this shifts many entries appropriately; however, the same 'certain individual(s)' would often (with no discernible interaction) arbitrarily move up in rank.
My question is this: is it possible that this rank is being affected by other, external profile's usage data/habits?
In the interests of privacy, it seems very unlikely that anything but personal habits would influence this ranking, but my own and other peoples' usage anecdotes seem to suggest 'arbitrary' movement that would only be explained by external data.
I cannot seem to find a definitive answer to this elsewhere.
Any input would be greatly appreciated.

Recommendation engine using google-prediction-api?

In google's prediction api page, it says we can use it for recommendation of webpages / products...
Can someone please show me how, for example:
I have 500,000 members purchased history
I have 2,000,000 products in 200 different categories
I have user-X just signup, asked him 15 'like' / 'dislike' product questions (user's taste)
Now, i want to suggest/recommend user-X with a list(e.g. 500) of products which he most likely willing to purchase
Thanks a lot
If you are not specifically tied to Google API fow whatever reason, explore using Mahout. This is a basic use case for the Mahout Recommendation mining.
https://cwiki.apache.org/MAHOUT/itembased-collaborative-filtering.html
The Google Prediction API, as currently implemented, is great for classifying data into a discrete set of categories, however, as noted in the documentation:
Avoid having a high ratio of categories to training data in categorical models.
Try to have at least a few dozen examples for each category, minimum.
For really good predictions, a few hundred examples per category is
recommended.
The Prediction API's classification doesn't work well when the ratio of categories to examples is high and in the example you sketched the relationship is one-to-one because you are trying to find the user whose liked product list is most similar to the user of interest (to find a set of promising products to recommend). In this model, each user is a unique category.

Geolocation APIs: SimpleGeo vs CityGrid vs PublicEarth vs Twitter vs Foursquare vs Loopt vs Fwix. How to retrieve venue/location information?

We need to display meta information (e.g, address, name) on our site for various venues like bars, restaurants, and theaters.
Ideally, users would type in the name of a venue, along with zip code, and we present the closest matches.
Which APIs have people used for similar geolocation purposes? What are the pros and cons of each?
Our basic research yielded a few options (listed in title and below). We're curious to hear how others have deployed these APIs and which ones are ultimately in use.
Fwix API: http://developers.fwix.com/
Zumigo
Does Facebook plan on offering a Places API eventually that could accomplish this?
Thanks!
Facebook Places is based on Factual. You can use Factual's API which is pretty good (and still free, I think?)
http://www.factual.com/topic/local
You can also use unauthenticated Foursquare as a straight places database. The data is of uneven quality since it's crowdsourced, but I find it generally good. It's free to a certain API limit, but I think the paid tier is negotiated.
https://developer.foursquare.com/
I briefly looked at Google Places but didn't like it because of all the restrictions on how you have to display results (Google wants their ad revenue).
It's been a long time since this question was asked but a quick update on answers for other people.
This post, right now at least, will not go into great detail about each service but merely lists them:
http://wiki.developer.factual.com/w/page/12298852/start
http://developer.yp.com
http://www.yelp.com/developers/documentation
https://developer.foursquare.com/
http://code.google.com/apis/maps/documentation/places/
http://developers.facebook.com/docs/reference/api/
https://simplegeo.com/docs/api-endpoints/simplegeo-context
http://www.citygridmedia.com/developer/
http://fwix.com/developer_tools
http://localeze.com/
They each have their pros and cons (i.e. Google Places only allows 20 results per query, Foursquare and Facebook Places have semi-unreliable results) which can be explained a bit more in detail, although not entirely, in the following link. http://www.quora.com/What-are-the-pros-and-cons-of-each-Places-API
For my own project I ended up deciding to go with Factual's API since there are no restrictions on what you do with the data (one of the only ToS' that I've read in its entirety). Factual has a pretty reliable API, which as a user of the API you may update, modify, or flag rows of the data. Facebook Places bases their data on Factual's, just another fact to shed some perspective.
Hope I can be of help to any future searchers.
This is not a complete answer, because I havn't compared the given geolocation API, but there is also the Google Places API, which solves a similiar problem like the other APIs.
One thing about SimpleGeo: The Location API of SimpleGeo supports mainly US (and Canada?) based locations. The last time I checked, my home country Germany doesn't has many known locations.
Comparison between places data APIs is tough to keep up to date, with the fast past of the space, and with acquisitions like SimpleGeo and HyperPublic changing the landscape quickly.
So I'll just throw in CityGrids perspective as of February 2012. CityGrid provides 18M US places, allowing up to 10M requests per month for developers (publishers) at no charge.
You can search using a wide range of "what" and "where" (Cities, Neighborhoods, Zip Codes, Metro Areas, Addresses, Intersections) searches including latlong. We have rich data for each place including images, videos, reviews, offers, etc.
CityGrid also has a developer revenue sharing program where we'll pay you to display some places as well as large mobile and web advertising network.
You can also query Places via the CityGrid API using Factual, Foursquare and other places providers places and venue IDs. We aggregate data from several places data providers through our system.
Website: http://developer.citygridmedia.com/

Trying to get facebook/twitter/myspace statuses and other data for statistics

I was wondering if anyone knows how to gather data from millions of people around the globe via these social networks in order to get the statistics. I need this for a project I'm trying to do and do not need to know the actual person posting such information (such as statuses, comments, information about them, etc) so as not to break any data privacy laws.
I need to know things like how many people commented about Obama today and what was their sex (female or male) and things like that.
is that possible in any way?
Thanks a million
I think you're asking if there are any resources to mine for social data.
Your best bet is to check out the Twitter or Facebook APIs. Variables like age, sex, location will probably be far more difficult to ascertain than raw status info, but it can be done.
For Twitter, I would recommend using the Twitter streaming API and filtering for specific keywords.
For MySpace, use the Real-Time Stream PUSH feature: http://wiki.developer.myspace.com/index.php?title=Category:Real_Time_Stream
the best tool so far used to get data from most of the social network site is this, build an algorithm that will suit your need from the data you collected

Anyone have a link to a technical discussion of anything akin to the Facebook news feed system?

I'm looking for a presentation, PDF, blog post, or whitepaper discussing the technical details of how to filter down and display massive amounts of information for individual users in an intelligent (possibly machine learning) kind of way. I've had coworkers hear presentations on the Facebook news feed but I can't find anything published anywhere that goes into the dirty details. Searches seem to just turn up the controversy of the system. Maybe I'm not searching for the right keywords...
#AlexCuse I'm trying to build something similar to Facebook's system. I have large amounts of data and I need to filter it down to something manageable to present to the user. I cannot use another website due to the scale of what I've got to work at. Also I just want a technical discussion of how to implement it, not examples of people who have an implementation.
Are you looking for something along the lines of distributed pub/sub with content based filtering? If so, you may want to look into Siena and some of the associated papers such as Design and Evaluation of a Wide-Area Event Notification Service