Evaluation of user-based collaborative filtering K-Nearest Neighbor Algorithm - recommendation-engine

I was trying to find evaluation mechanisms of collaborative K-Nearest neighbor algorithm, but i am confused how can I evaluate this algorithm. How can I be sure that the recommendation done by this algorithm is correct or good. Actually I have also developed an algorithm that i want to compare with it. but i am not sure how can i compare and evaluate both of them. The data set used by me is of movie lens.
your people help on evaluating this recomender system will be highly appreciated.

Evaluating recommender systems is a large concern of its research and industry communities. Look at "Evaluating collaborative filtering recommender systems", a Herlocker et al paper. The people who publish MovieLens data (the GroupLens research lab at the University of Minnesota) also publish many papers on recsys topics, and the PDFs are often free at http://grouplens.org/publications/.
Check out https://scholar.google.com/scholar?hl=en&q=evaluating+recommender+systems.
In short, you should use a method that hides some data. You will train your model on a portion of the data (called "training data") and test on the remainder of the data that your model has never seen before. There's a formal way to do this called cross-validation, but the general concept of visible training data versus hidden test data is the most important.
I also recommend https://www.coursera.org/learn/recommender-systems, a Coursera course on recommender systems taught by GroupLens folks. In that course you'll learn to use LensKit, a recommender systems framework in Java that includes a large evaluation suite. Even if you don't take the course, LensKit may be just what you want.

Related

Feature selection for one class classification

I try to apply One Class SVM but my dataset contains too many features and I believe feature selection would improve my metrics. Are there any methods for feature selection that do not need the label of the class?
If yes and you are aware of an existing implementation please let me know
You'd probably get better answers asking this on Cross Validated instead of Stack Exchange, although since you ask for implementations I will answer your question.
Unsupervised methods exist that allow you to eliminate features without looking at the target variable. This is called unsupervised data (dimensionality) reduction. They work by looking for features that convey similar information and then either eliminate some of those features or reduce them to fewer features whilst retaining as much information as possible.
Some examples of data reduction techniques include PCA, redundancy analysis, variable clustering, and random projections, amongst others.
You don't mention which program you're working in but I am going to presume it's Python. sklearn has implementations for PCA and SparseRandomProjection. I know there is a module designed for variable clustering in Python but I have not used it and don't know how convenient it is. I don't know if there's an unsupervised implementation of redundancy analysis in Python but you could consider making your own. Depending on what you decide to do it might not be too tricky (especially if you just do correlation based).
In case you're working in R, finding versions of data reduction using PCA will be no problem. For variable clustering and redundancy analysis, great packages like Hmisc and ClustOfVar exist.
You can also read about other unsupervised data reduction techniques; you might find other methods more suitable.

Are there any implementations available online for filter based feature selection methods?

The selection methods I am looking for are the ones based on subset evaluation (i.e. do not simply rank individual features). I prefer implementations in Matlab or based on WEKA, but implementations in any other language will still be useful.
I am aware of the existence of CsfSubsetEval and ConsistencySubsetEval in WEKA, but they did not lead to good classification performance, probably because they suffer from the following limitation:
CsfSubsetEval is biased toward small feature subsets, which may prevent locally predictive features from being included in the selected subset, as noted in [1].
ConsistencySubsetEval use min-features bias [2] which, similarly to CsfSubsetEval, result in the selection of too few features.
I know it is "too few" because I have built classification models with larger subsets and their classification performance were relatively much better.
[1] M. A. Hall, Correlation-based Feature Subset Selection for Machine Learning, 1999.
[2] Liu, Huan, and Lei Yu, Toward integrating feature selection algorithms for classification and clustering, 2005.
Check out python scikit learn simple and efficient tools for data mining and data analysis. There are various implemented methods for feature selection, classification, evaluation and a lot of documentations and tutorials.
My search has led me to the following implementations:
FEAST toolbox: it is an interesting toolbox, developed by the University of Manchester, and provide implementations of Shannon's Information Theory functions. The implementations can be downloaded from THIS webpage, and they can be used to evaluate individual features as well as subset of features.
I have also found THIS matlab code, which is an implementation for a selection algorithm based on Interaction Information.
PY_FS: A Python Package for Feature Selection
I came across this package [1] which was just released (2021) and contains many methods with reference to their original papers.

Theoretically, can everyday computing tasks be broken down into ones solvable by a neural network?

MIT Review recently published this article about a chip from IBM, which is more or less a Artificial neural network. Why IBM’s New Brainlike Chip May Be “Historic” | MIT Technology Review
The article suggests that the chip might have borrowed a page from the future. It might be the beginning of an era of new and evolved computation power. And also talks about programming for the chip.
One downside is that IBM’s chip requires an entirely new approach to
programming. Although the company announced a suite of tools geared
toward writing code for its forthcoming chip last year (see “IBM
Scientists Show Blueprints for Brainlike Computing”), even the best
programmers find learning to work with the chip bruising, says Modha:
“It’s almost always a frustrating experience.” His team is working to
create a library of ready-made blocks of code to make the process
easier.
Which brings me to the question, can everyday computing tasks be broken down into ones solvable by a neural network (theoretically and/or practically)?
It depends from the task.
There are plenty of tasks, for which John von Neumann computers are good enough. For example calculate precise values of functions in some range, or for example apply some filter at image, or store text into db and read it from it or store prices of some products etc. This is not area where NN are needed. Of course it is possible theoretically to train NN to choose where and how to save data. Or to train it to accountancy. But for cases where big array of data need to be analyzed, and current methods doesn't feet NN can be option to consider. Or speech recognition, or buying some pictures which will become masterpieces in the future can be done with NN.

Has anyone tried to compile code into neural network and evolve it?

Do you know if anyone has tried to compile high level programming languages (java, c#, etc') into a recurrent neural network and then evolve them?
I mean that the whole process including memory usage is stored in a graph of a neural net, and I'm talking about complex programs (thinking about natural language processing problems).
When I say neural net I mean a directed weighted graphs that spreads activation, and the nodes are functions of their inputs (linear, sigmoid and multiplicative to keep it simple).
Furthermore, is that what people mean in genetic programming or is there a difference?
Neural networks are not particularly well suited for evolving programs; their strength tends to be in classification. If anyone has tried, I haven't heard about it (which considering I barely touch neural networks is not a surprise, but I am active in the general AI field at the moment).
The main reason why neural networks aren't useful for generating programs is that they basically represent a mathematical equation (numeric, rather than functional). Given some numeric input, you get a numeric output. It is difficult to interpret these in the context of a program any more complicated than simple arithmetic.
Genetic Programming traditionally uses Lisp, which is a pure functional language, and often programs are often shown as tree diagrams (which occasionally look similar to some neural network diagrams - is this the source of your confusion?). The programs are evolved by exchanging entire branches of a tree (a function and all its parameters) between programs or regenerating an entire branch randomly.
There are certainly a lot of good (and a lot of bad) references on both of these topics out there - I refrain from listing them because it isn't clear what you are actually interested in. Wikipedia covers each of these techniques, and is a good starting point.
Genetic programming is very different from Neural networks. What you are suggesting is more along the lines of genetic programming - making small random changes to a program, possibly "breeding" successful programs. It is not easy, and I have my doubts that it can be done successfully across a large program.
You may have more luck extracting a small but critical part of your program, one which has a few particular "aspects" (such as parameter values) that you can try to evolve.
Google is your friend.
Some sophisticated anti-virus programs as well as sophisticated malware use formal grammar and genetic operators to evolve against each other using neural networks.
Here is an example paper on the topic: http://nexginrc.org/nexginrcAdmin/PublicationsFiles/raid09-sadia.pdf
Sources: A class on Artificial Intelligence I took a couple years ago.
With regards to your main question, no one has ever tried that on programming languages to the best of my knowledge, but there is some research in the field of evolutionary computation that could be compared to something like that (but it's obviously a far-fetched comparison). As a matter of possible interest, I asked a similar question about sel-improving compilers a while ago.
For a difference between genetic algorithms and genetic programming, have a look at this question.
Neural networks have nothing to do with genetic algorithms or genetic programming, but you can obviously use either to evolve neural nets (as any other thing for that matters).
You could have look at genetic-programming.org where they claim that they have found some near human competitive results produced by genetic programming.
I have not heard of self-evolving and self-imrpvoing programs before. They may exist as special research tools like genetic-programming.org have but nothing solid for generic use. And even if they exist they are very limited to special purpose operations like malware detection as Alain mentioned.

Project ideas for discrete mathematics course using MATLAB? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Closed 1 year ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
A professor asked me to help making a specification for a college project.
By the time the students should know the basics of programming.
The professor is a mathematician and has little experience in other programming languages, so it should really be in MATLAB.
I would like some projects ideas. The project should
last about 1 to 2 months
be done individually
have web interface would be great
doesn't necessary have to go deep in maths, but some would be great
use a database (or store data in files)
What kind of project would make the students excited?
If you have any other tips I'll appreciate.
UPDATE: The students are sophomores and have already studied vector calculus. This project is for an one year Discrete Mathematics course.
UPDATE 2: The topics covered in the course are
Formal Logic
Proofs, Recursion, and Analysis of Algorithms
Sets and Combinatorics
Relations, Functions, and Matrices
Graphs and Trees
Graph Algorithms
Boolean Algebra and Computer Logic
Modeling Arithmetic, Computation, and Languages
And it'll be based on this book Mathematical Structures for Computer Science: A Modern Approach to Discrete Mathematics by Judith L. Gersting
General Suggestions:
There are many teaching resources at The MathWorks that may give you some ideas for course projects. Some sample links:
The MATLAB Central blogs, specifically some posts by Loren that include using LEGO Mindstorms in teaching and a webinar about MATLAB for teaching (note: you will have to sign up to see the webinar)
The Curriculum Exchange: a repository of course materials
Teaching with MATLAB and Simulink: a number of other links you may find useful
Specific Suggestions:
One of my grad school projects in non-linear dynamics that I found interesting dealt with Lorenz oscillators. A Lorenz oscillator is a non-linear system of three variables that can exhibit chaotic behavior. Such a system would provide an opportunity to introduce the students to numerical computation (iterative methods for simulating systems of differential equations, stability and convergence, etc.).
The most interesting thing about this project was that we were using Lorenz oscillators to encode and decode signals. This "encrypted communication" aspect was really cool, and was based on the following journal article:
Kevin M. Cuomo and Alan V. Oppenheim,
Circuit Implementation of Synchronized Chaos with Applications
to Communications, Physical Review
Letters 71(1), 65-68 (1993)
The article addresses hardware implementations of a chaotic communication system, but the equivalent software implementation should be simple enough to derive (and much easier for the students to implement!).
Some other useful aspects of such a project:
The behavior of the system can be visualized in 2-D and 3-D plots, thus exposing the students to a number of graphing utilities in MATLAB (PLOT, PLOT3, COMET, COMET3, etc.).
Audio signals can be read from files, encrypted using the Lorenz equations, written out to a new file, and then decrypted once again. You could even have the students each encrypt a signal with their Lorenz oscillator code and give it to another student to decrypt. This would introduce them to various file operations (FREAD, FWRITE, SAVE, LOAD, etc.), and you could even introduce them to working with audio data file formats.
You can introduce the students to the use of the PUBLISH command in MATLAB, which allows you to format M-files and publish them to various output types (like HTML or Word documents). This will teach them techniques for making useful help documentation for their MATLAB code.
I have found that implementing and visualizing Dynamical systems is great
for giving an introduction to programming and to an interesting branch of
applied mathematics. Because one can see the 'life' in these systems,
our students really enjoy this practical module.
We usually start off by visualizing a 1D attractor, so that we can
overlay the evolution rule/rate of change with the current state of
the system. That way you can teach computational aspects (integrating the system) and
visualization, and the separation of both in implementation (on a simple level, refreshing
graphics at every n-th computation step, but in C++ leading to threads, unsure about MATLAB capabilities here).
Next we add noise, and then add a sigmoidal nonlinearity to the linear attractor. We combine this extension with an introduction to version control (we use a sandbox SVN repository for this): The
students first have to create branches, modify the evolution rule and then merge
it back into HEAD.
When going 2D you can simply start with a rotation and modify it to become a Hopf oscillator, and visualize either by morphing a grid over time or by going 3D when starting with a distinct point. You can also visualize the bifurcation diagram in 3D. So you again combine generic MATLAB skills like 3D plotting with the maths.
To link in other topics, browse around in wikipedia: you can bring in hunter/predator models, chaotic systems, physical systems, etc.etc.
We usually do not teach object-oriented-programming from within MATLAB, although it is possible and you can easily make up your own use cases in the dynamical systems setting.
When introducing inheritance, we will already have moved on to C++, and I'm again unaware of MATLAB's capabilities here.
Coming back to your five points:
Duration is easily adjusted, because the simple 1D attractor can be
done quickly and from then on, extensions are ample and modular.
We assign this as an individual task, but allow and encourage discussion among students.
About the web interface I'm at a loss: what exactly do you have in mind, why is it
important, what would it add to the assignment, how does it relate to learning MATLAB.
I would recommend dropping this.
Complexity: A simple attractor is easily understood, but the sky's the limit :)
Using a database really is a lot different from config files. As to the first, there
is a database toolbox for accessing databases from MATLAB. Few institutes have the license though, and apart from that: this IMHO does not belong into such a course. I suggest introducing to the concept of config files, e.g. for the location and strength of the attractor, and later for the system's respective properties.
All this said, I would at least also tell your professor (and your students!) that Python is rising up against MATLAB. We are in the progress of going Python with our tutorials, but I understand if someone wants to stick with what's familiar.
Also, we actually need the scientific content later on, so the usefulness for you will probably depend on which department your course will be related to.
A lot of things are possible.
The first example that comes in mind is to model a public transportation network (the network of your city, with underground, buses, tramways, ...). It is represented by a weighted directed graph (you can use sparse matrix to represent it, for example).
You may, for example, ask them to compute the shortest path from one station to another one (Moore-dijkistra algorithm, for example) and display it.
So, for the students, the several steps to do are:
choose an appropriate representation for the network (it could be some objects to represent the properties of the stations and the lines, and a sparse matrix for the network)
load all the data (you can provide them the data in an XML file)
be able to draw the network (since you will put the coordinates of the stations)
calculate the shortest path from one point to another and display it in a pretty way
create a fronted (with GUI)
Of course, this could be complicated by adding connection times (when you change from one line to another), asking for several options (shortest path with minimum connections, take in considerations the time you loose by waiting for a train/bus, ...)
The level of details will depend on the level of the students and the time they could spend on it (it could be very simple, or very realist)
You want to do a project with a web interface and a database, but not any serious math... and you're doing it in MATLAB? Do you understand that MATLAB is especially designed to be used for "deep math", and not for web interfaces or databases?
I think if this is an intro to a Discrete Mathematics course, you should probably do something involving Discrete Mathematics, and not waste the students' time as they learn a bunch of things in that language that they'll never actually use.
Why not do something involving audio? I did an undergraduate project in which we used MATLAB to automatically beat-match different tunes and DJ mix between them. The full program took all semester, but you could do a subset of it. wavread() and the like are built in and easy to use.
Or do some simple image processing like finding Waldo using cross-correlation.
Maybe do something involving cryptography, have them crack a simple encryption scheme and feel like hackers.
MATLAB started life as a MATrix LAB, so maybe concentrating on problems in linear algebra would be a natural fit.
Discrete math problems using matricies include:
Spanning trees and shortest paths
The marriage problem (bipartite graphs)
Matching algorithms
Maximal flow in a network
The transportation problem
See Gil Strang's "Intro to Applied Math" or Knuth's "Concrete Math" for ideas.
You might look here: http://www.mathworks.com/academia/student_center/tutorials/launchpad.html
on the MathWorks website. The interactive tutorial (second link) is quite popular.
--Loren
I always thought the one I was assigned in grad school was a good choice-a magnetic lens simulator. The math isn't completely overwhelming so you can focus more on learning the language, and it's a good intro to the graphical capabilities (e.g., animating the path of an off-axis electron going through the lens).
db I/O and fancy interfaces are out of place in a discrete math course.
my matlab labs were typically algorithm implementations, with charts as output, and simple file input.
how hard is the material? image processing is really easy in matlab, can you do some discrete 2D filtering? blurs and stuff. http://homepages.inf.ed.ac.uk/rbf/HIPR2/filtops.htm