I'd like to play around with building a recommendations system, and by that I mean an algorithm that looks at preferences and/or reviews posted by a user and then makes recommendations for them, similar to what netflix or amazon use.
What are some good resources for learning how to write something like this? Where should I start?
Check out the Wikipedia page on the Netflix Prize and its discussion forum. Also, the somewhat related 2009 GitHub Contest is a good source for full source code on a number of different recommendation engines. And obviously there's also the Wikipedia page on the topic itself, which has some decent links.
If you start writing your own, you'll want to use a corpus. I'd actually recommend using the Netflix Prize's data set. Just carve the data set into two pieces. Train on the first piece and score your algorithm on the second piece.
Addenda: A somewhat related and scary application of this sort of thing is predicting demographic information: a user's gender, age, household income, IQ, sexual orientation, etc. You could probably do most of these attributes with the Netflix Prize dataset with a fairly high degree of accuracy. Fortunately everyone in that dataset is just a number.
Take a look at pysuggest a Python library that implements a variety of recommendation algorithms for collaborative filtering (which is used by Amazon.com).
Related
How are features compared on large systems? For example, when I search on google, does google compare my request against all the web sites? Or just some specific platforms like Netflix or Youtube. Does it scan all the videos one by one to detect how good the videos are for me?
It doesn't work like that. It is done by Machine Learning.
What it does is it takes a lot of information and get similarities data from other people.And then it applies it on your choice.
That's a great question!
Not all of the services you listed work the same way. Google does something called indexing, which is basically storing websites in a way where they can be looked up much more efficiently. Discord also does this with messages, which you may have noticed if you use it.
Netflix has about 2000 shows, whereas there are millions and millions of websites, and probably billions of Google-indexed pages. So doing a Netflix search is much simpler and probably doesn't require much indexing or fanciness.
If you're interested in the recommendation algorithms sites like Netflix and YouTube use, you might want to look into Collaborative Filtering. It's a pretty simple algorithm, and it's really interesting.
I know, the use case might be specific but more and more stuff in all industry sectors is digitalized—and so is the communication between different departments which sometimes talk in very different languages. I searched the internet, but I wasn't able to find a clear answer (either I didn't find the right search phrases or the internet itself just doesn't know).
Here's my scenario: I'm working with several departments which work with diagrams (for example a lighting setup). This diagram solves different purposes:
which devices are used?
where are they placed?
where are they pointing?
how are they configured (e.g. exposure)?
They tend to export their finalized diagram as either an image or a PDF— which is fine if you want to print it out but considerably less helpful if another department (mine) has to work with the raw information. That's where I wondered if there's some kind of industry standard (SVG, XML, JSON, etc.) which is both supported by the programs these departments used and can be interpreted by some sort of programming language. Do you know anything like that?
Thanks in advance!
I'm trying to come up with the largest possible group of friends that would theoretically get along with each other, i.e., each person in the group should know at least 50% of the other people in the group.
I'm trying to come up with an algorithm for this that doesn't take ridiculously long; Facebook's API/cross-server talk is pretty slow as is.
I was thinking I could start with the friend that has the most mutual friends with me first, and then add people to the group one by one. But who would I choose next?
Just interested in the theory, no code is necessary.
Edit: When I said "theory", what I really meant what's the next logical step in plain english :) I was hoping I could code this up in an afternoon, but I guess this is a bit more complicated than I anticipated, and I'm not sure I want to spend weeks delving into heavy graph theory. Nevertheless, maybe someone else will find this interesting.
MIT did some work on social graphing a while back. Although it used mobile phone data, the clustering algorithms and other systems should still apply, even though they are constructed using different inputs and criteria.
There is more MIT chatter about social graphing going on at the moment. Definitely the place to look for technical pointers on this kind of thing.
Whilst the problem of graph enumeration from a given node to it's edges is NP complete for most useful problems ... the application of the graph traversal and the wealth of information might help you make this more efficient:
For any node (profile) N, you could data-scrape using Google or something to find associated edges out. This means that you can harness a cache of the pages and Googles search technology to mitigate having to traverse the edges yourself.
Social profiles contain tons of meta-data. Developing a statistical analysis method for working out the likelyhood of A knowing B without a direct path might be useful. Afterall friends have a) similar locations and b) similar interests
Other data, seemingly irrelevant can provide a means for locating people likely to know eachother and then you can double check the edges. Things such as chatter on boards about a band or gig, or people mentioning "cat fight" when Kate smacked Mary in the mouth.
The data just needs looking at in the right way, in the same way MIT looked at geographical statistics to determine relationships through phones.
Good Luck
There is an Algorithm called SCAN-Algorithm with some precalculations the algorithm can cluster a network in a good speed.
You can find informations about the algorithm here: SCAN: A Structural Clustering Algorithm for Networks
This is more "broad", but see if it helps to get ideas.
I was wondering if there exists any open source frameworks that will help me include the following type of functionality to my website:
1) If I am viewing a particular product, I would like to see what other products may be interesting to me. This information may be deduced by calculating for example what other people in my region (or any other characteristic of my profile) bought in addition to the product that I am viewing. Kind of like what Amazon.com does.
2) Deduce relationships between people based on their profile, interaction with one another on the website (via commenting on one another´s posts for example), use of the website in terms of areas most navigated, products bought in common etc.
I am not looking for a open source website with this functionality, but something like an object model into which I can feed information about users and their use of the site including rules about relationships and then at a later point ask it questions described in (1) and (2) above.
Any pointers to white papers / general information about best approaches to do this, or any related links will really help too.
(I am the developer of Taste, which is now part of Apache Mahout)
1) You're really asking for two things here:
a) Recommend items I might like
b) Favor items that are similar to the thing I am currently looking at.
Indeed, Mahout Taste is all about answering a). Everything it does supports systems like this. Take a look at the documentation to get started, and ask any questions to mahout-user#apache.org.
For 1b) in particular, Mahout has two answers:
If you are only interested in what items are similar to the current item, you would be interested in the ItemSimilarity abstraction in Mahout (org.apache.mahout.cf.taste.similarity.ItemSimilarity) and its implementations, like PearsonCorrelationSimilarity. Based on a set of user-item ratings, this could tell you an estimated similarity between any two items. You'd then just pick the most similar items. In fact, look at the TopItems class in Mahout which can just figure this for you quickly.
But also, you can combine a) and b) by computing recommendations, then applying a Rescorer implementation which then favors items that are similar to the currently-viewed item.
2) Yes likewise, you would be interesting the UserSimilarity abstraction, implementations, etc. This would deduce similarities based on item ratings. Mahout however does not help you deduce these ratings by, say, looking at user behavior. This is domain-specific and up to you.
Sound confusing -- read the docs and feel free to follow up on mahout-user#apache.org where I can tell you more.
I am researching the same topic, as I'm working on a project to help people decide how to vote on California's complicated ballot measures. Here are some open-source collaborative filtering engines that I've found:
Vogoo (PHP)
acts_as_recommendable (Ruby on Rails)
Mahout (formerly Taste) (Java)
There's also a good overview of these engines here.
There are also the Duine framework and OpenSlopeOne.
But in my opinion, Mahout is still the best.
You can find a survey about Open Source Recommender Systems here:
http://girlincomputerscience.blogspot.com.br/2012/11/open-source-recommendation-systems.html
Hope it helps!
You can find a List of Recommender Systems here
I will be entering my third year of university in my next academic year, once I've finished my placement year as a web developer, and I would like to hear some opinions on the two modules in the Title.
I'm interested in both, however I want to pick one that will be relevant to my career and that I can apply to systems I develop.
I'm doing an Internet Computing degree, it covers web development, networking, database work and programming. Though I have had myself set on becoming a web developer I'm not so sure about that any more so am trying not to limit myself to that area of development.
I know HCI would help me as a web developer, but do you think it's worth it? Do you think Neural Network knowledge could help me realistically in a system I write in the future?
Thanks.
EDIT:
I thought it would be useful to follow-up with what I decided to do and how it's worked out.
I picked Artificial Neural Networks over HCI, and I've really enjoyed it. Having a peek into cognitive science and machine learning has ignited my interest for the subject area, and I will be hoping to take on a postgraduate project a few years from now when I can afford it.
I have got a job which I am starting after my final exams (which are in a few days) and I was indeed asked if I had done a module in HCI or similar. It didn't seem to matter, as it isn't a front-end developer position!
I would recommend taking the module if you have it as an option, as well as any module consisting of biological computation, it will open up more doors should you want to go onto postgraduate research in the future.
The worthiness depends on three factors:
How familiar are you with the topic already?
How good is the course/class you want to take?
What are your interested in more?
Especially for HCI, there is a broad range of "common sense" information you would also easily obtain from reading a good book or a wider range of articles about it also published on the internet. On the other hand, there indeed exist many deeper insights mostly obtained by Psychology studies. If the course is done right, you can indeed learn a lot about the topic and the real considerations to use for developing an interface.
For Neural Networks, one has to say that this is a typical hype topic. It would be mainly interesting in what application domain the course wants to deal with neural networks. You can be quite sure that you won't program or use any neural networks for web development. On the other hand, if the course is done right, this could be a good opportunity for you to broaden your knowledge. Especially, deepening your understanding about the theory of computer science. This highly depends on how the course is laid out, though.
HCI is a topic which helps your career as a web developer, but only if you feel incompetent in that topic (then it is a must) or it is done very well. Neural Networks is a topic which has more potential of being really interesting hardcore computer science stuff, where you indeed learn a better understanding about something. If you are interested in NN, you should not pass the opportunity to get an education which is not narrowly concentrated on the domain of web development -- and, after all, perhaps find more interest in other stuff (it is always good to know other directions you would perhaps like to go into for the future).
Neural networks sound cool until you read the fine print:
In modern software implementations of
artificial neural networks the
approach inspired by biology has more
or less been abandoned for a more
practical approach based on statistics
and signal processing.
This is something that has mystified me for years. Here you have an amazingly complex and powerful control system (real-world biological neural networks), and an academic discipline that appears to be about modeling these systems in software but that has in reality abandoned that activity.
If you're doing web development, your time is probably better spent in the HCI course.
Go with what interests you the most. The HCI stuff will be much easier to pick up later as needed, you'll likely never get another chance to learn about neural networks!
For prospective employers (at least the good ones!) you need to show a passion and excitement about what you do. I'd sooner hire someone who can enthusiastically talk about neural networks than someone who has an extra credit in HCI.
Unless you want to do the research end of the world, ie, get a Masters/PhD, go HCI.
I studied Neural Computation at University when I studied AI. I now run my own company. The number of times since I studied that I have used my NN skills equals zero. I'm glad I did it, as it was quite fascinating, but I would have found HCI much more useful from the position I'm at now. I think that you'd pick up a lot more insight from an HCI course relevant to the software industry, but if you think you experience should be more on the esoteric/almost arty side of development, go for NN.
Which sounds like more fun? Or, equivalently, which will you work harder at? Pick that one.
Did two courses in NN and some other AI-courses - its fun to poke round with that stuff and I actually managed to implement the stuff in some of the things I've done like face-recognition, and it's useful in some other areas to if you wanna plot your lab data etc. I have never used the NN:s in my web development career though I am sure it could be used for something however what it all really boils down to is to find a client or employee willing pay for it when you can just take the straight path. So I would rather read book about it if I wasn't that hardcore about it.
Fundamental Neural Networks doesn't take to much knowledge in math, and was what I used in my first course.
as a programmer to be you need the knowledge of neural network. if parallel processing is the way to go in hardware then future programmers must be knowledgable in neural network. don't forget that NN works better with noise or imprecise data but other systems may not. Note that most data we use for analysis are sample data which is a fraction of the whole and you could imagine if some in the sample are way off. so you need knowledge of NN if you want to last in computer programming field.