How does the recommendation systems deal with very large data? - recommendation-engine

How are features compared on large systems? For example, when I search on google, does google compare my request against all the web sites? Or just some specific platforms like Netflix or Youtube. Does it scan all the videos one by one to detect how good the videos are for me?

It doesn't work like that. It is done by Machine Learning.
What it does is it takes a lot of information and get similarities data from other people.And then it applies it on your choice.

That's a great question!
Not all of the services you listed work the same way. Google does something called indexing, which is basically storing websites in a way where they can be looked up much more efficiently. Discord also does this with messages, which you may have noticed if you use it.
Netflix has about 2000 shows, whereas there are millions and millions of websites, and probably billions of Google-indexed pages. So doing a Netflix search is much simpler and probably doesn't require much indexing or fanciness.
If you're interested in the recommendation algorithms sites like Netflix and YouTube use, you might want to look into Collaborative Filtering. It's a pretty simple algorithm, and it's really interesting.

Related

Real-time auction updates - Comet? Tornado? ActiveMQ?

I'm in the process of deciding how to write an online auction application. I would like to provide real-time updates to the site users. My background is with LAMP (although, in my case, the 'P' would be more for Perl than PHP). I've considered ActiveMQ, but I'm wondering if there are better options.
My primary concerns are scalability and speed. It could have several simultaneous auctions taking place, with [hopefully] many users participating in each auction. Whatever solution that I decide on would have to accommodate such a scenario. Of course, this is all in theory so I have no idea how many concurrent users that I might have, but I'd like to have the means to support tens of thousands of users.
Another concern is ease of implementation. I've spent the past few days reading docs and tutorials and, so far, nothing has come across as anything less than a bit of a pain in the rear to deal with, which is actually what has led me here to seek some advice.
I was hoping to use a web framework, such as Codeigniter (PHP) or Catalyst (Perl), because I intend to pay a contractor or two to help with some of the bulk of the coding, and I like the idea of having a framework to somewhat enforce a design pattern. However, the more that I look into this, I'm just not seeing an obvious solution to 1) use a framework, and 2) provide real-time auction updates (other than Tornado, I guess - maybe I'm answering my own question. ;)).
So, with all that said, short of using polling (which I'm not really interested in doing), is there a way that I can accomplish these real-time updates using a language like Perl or PHP for my server-side code? I know that ActiveMQ supports STOMP, and I actually have this working on my local machine (using Jetty since it requires a servlet to publish/consume messages from client-side javascript), but is there a better option here?
I'm sorry that I don't have a more direct question, but after several days of looking at docs and tutorials, I'm more lost than ever!
Part of your problem is that your mixing a variety of concepts together. If I read things correctly you have a problem statement of:
I'm building an online auction site and would like to insure that my visitors have real-time updates of prices on the items they are viewing.
Now between the Browser and the Server you'll probably use a Comet style request pattern to handle communications, you could also look at socket.io as a backup pattern. This polling will require a server that is able to handle lots of simultaneous open connections, which Tornado is a good candidate (there are others, but given you asked in relationship to Tornado it's good).
Now that we've gone from 1000+ of Browsers to a handful of Tornado servers, you need a way to communicate between them. In the the last of publish/subscribe message patterns you have a few choices:
RabbitMQ (AMQP)
ZeroMQ
Redis Pub/Sub
All three a good choices, with their own pros/cons. Personally I've used Redis and Rabbit on different projects and just toyed with ZeroMQ. The message broker is a whole decision tree that is going to be based on what you have available.

Which CMS, if any, would be best suited for a database-driven website?

For educational purposes, I am delving into some web development. What I have in mind right now is a website where users can submit as well as view benchmark scores for CPUs, GPus etc. As is evident, this will be heavily driven by a database which will store all the scores etc.
I have programming experience with OOPs (C++, C#), and am not too worried about picking up PHP. However, I feel intimidated by front-end design (HTML, CSS etc.), and for that reason am shying away from developing the website from scratch.
I'm using MS WebMatrix, but I'm not sure which CMS will be best suited for me. Currently, I've reviewed the following: DotNetNuke, Umbraco, Joomla, Drupal; but haven't been able to pinpoint one yet.
Any suggestions which will be best suited for my kind of website?
Most widespread like Wordpress and Drupal CMS (and others) are extensible, meaning that you can create your own content types following the imposed workflow of each one's architecture. So the best suited for you will be the one that take less time learning.
I will recommend you Wordpress because I found that the learning curve is minimal if you can read their PHP source code, that is no need to read a book in its nth edition to cover to cover.
This page is a good start point to create a post type for Bechmarks. But again you could accomplish the same with other CMS, say Drupal. A sibling site of SO is devoted solely to WordPress.
hope that helps!

CMS framework suitable for educational website

I am building a educational site for carpentry.
It will contain lost of articles and tutorials sorted in different categories.
I am looking for suitable CMS and/or theme.
The focus is on content and ease of organizing lots of links ion categories. So I don't care about beautiful visual design, but rather a neat way to organize a lot of information on a topic.
Thank you!
Hmmm what's the world's leading example of a lot of information on just about every topic? Wikipedia. And the software that runs Wikipedia, 'mediawiki', is free and open-source. You could try that.
It might take a bit of getting used to, since there's no hierarchy of URLs (so for example you dont have /tutorials/cabinets/tutorial1.html) but everything can be in multiple categories by tagging: [[category:tutorial]] [[category:cabinetmaking]] etc. And the power comes in search and rich linking.
I put this out as an alternative to the standard Drupal/Joomla/LAMP-stack CMSs that other people will doubtless suggest.
Educational site? Maybe you're looking for an LMS instead?
Check edu 2.0 at http://www.edu20.org/
If you don't need the specific features of an LMS - grading, testing, etc then Joomla or Drupal would easily be able to do what you are looking for. Both are very good at organizing content.

Are there any medium-sized web applications built with CGI::Application that are open-sourced?

I learn best by taking apart something that already does something and figuring out why decisions were made in which manner.
Recently I've started working with Perl's CGI::Application framework, but found i don't really get along well with the documentation (too little information on how to best structure an application with it). There are some examples of small applications on the cgi-app website, but they're mostly structured such that they demonstrate a small feature, but contain mostly of code that one would never actually use in production. Other examples are massively huge and would require way too much time to dig through. And most of them are just stuff that runs on cgiapp, but isn't open source.
As such I am looking for something that has most base functionality like user logins, db access, some processing, etc.; is actually used for something but not so big that it would take hours to even set them up.
Does something like that exist or am i out of luck?
CGI::Application tends to be used for small, rapid-development web applications (much like Dancer, Maypole and other related modules). I haven't seen any real examples of open-source web apps built on top of it, though perhaps I'm not looking hard enough.
You could look at Catalyst. The wiki has a list of Catalyst-powered software and there are a large number of apps there - poke around, see if you like the look of the framework. Of this, this is Perl, so some of those apps will be using Template::Toolkit, some will use HTML::Mason... still, you'll get a general idea.
Try looking at Miril CMS. Although I don't know in which state it is.
I am the same with code, and had the same request. When I did not find a solution I created my own. which is https://github.com/alexxroche/Notice
I hope that it is a good solution to this request.
Notice demonstrates:
CGI::Application
CGI::Application::Plugin::ConfigAuto
CGI::Application::Plugin::AutoRunmode
CGI::Application::Plugin::DBH
CGI::Application::Plugin::Session;
CGI::Application::Plugin::Authentication
CGI::Application::Plugin::Redirect
CGI::Application::Plugin::DBIC::Schema
CGI::Application::Plugin::Forward
CGI::Application::Plugin::TT
It comes with an example mysql schema, but because of DBIC::Schema it can be used with PostgreSQL, (or anything else that DBIx::Class supports.)
I use Notice in all of my real life applications since 2007. The version in github is everything except the branding and the content.
Check out the Krang CMS.

Where can I learn about recommendation systems?

I'd like to play around with building a recommendations system, and by that I mean an algorithm that looks at preferences and/or reviews posted by a user and then makes recommendations for them, similar to what netflix or amazon use.
What are some good resources for learning how to write something like this? Where should I start?
Check out the Wikipedia page on the Netflix Prize and its discussion forum. Also, the somewhat related 2009 GitHub Contest is a good source for full source code on a number of different recommendation engines. And obviously there's also the Wikipedia page on the topic itself, which has some decent links.
If you start writing your own, you'll want to use a corpus. I'd actually recommend using the Netflix Prize's data set. Just carve the data set into two pieces. Train on the first piece and score your algorithm on the second piece.
Addenda: A somewhat related and scary application of this sort of thing is predicting demographic information: a user's gender, age, household income, IQ, sexual orientation, etc. You could probably do most of these attributes with the Netflix Prize dataset with a fairly high degree of accuracy. Fortunately everyone in that dataset is just a number.
Take a look at pysuggest a Python library that implements a variety of recommendation algorithms for collaborative filtering (which is used by Amazon.com).