Understanding the Pearson Correlation Coefficient - recommendation-engine

As part of the calculations to generate a Pearson Correlation Coefficient, the following computation is performed:
In the second formula: p_a,i is the predicted rating user a would give item i, n is the number of similar users being compared to, and ru,i is the rating of item i by user u.
What value will be used if user u has not rated this item? Did I misunderstand anything here?

According to the link, earlier calculations in step 1 of the algorithm are over a set of items, indexed 1 to m, whe m is the total number of items in common.
Step 3 of the algorithm specifies: "To find a rating prediction for a particular user for a particular item, first select a number of users with the highest, weighted similarity scores with respect to the current user that have rated on the item in question."
These calculations are performed only on the intersection of different users set of rated items. There will be no calculations performed when a user has not rated an item.

It only makes sense to calculate results if both users have rated a movie. Linear regression can be visualised as a method of finding a straight line through a two-dimensional graph where one variable is plotted on the X axis and another one - on Y axis. Each combination of ratings is represented as a point on an euclidean plane [u1_rating, u2_rating]. Since you can not plot points which only have one dimension to them, you'll have to discard those cases.

Related

Query about NaiveBayes Classifier

I am building a text classifier for classifying reviews as positive or negative. I have a query on NaiveBayes classifier formula:
| P(label) * P(f1|label) * ... * P(fn|label)
| P(label|features) = --------------------------------------------
| P(features)
As per my understanding, probabilities are multiplied if the events occur together. E.g. what is the probability of A and B occurring together. Is it appropriate to multiply the probabilities in this case? Appreciate if someone can explain this formula in a bit detail. I am trying to do some manual classification (just to check some algorithm generated classifications which seem a tad off, this will enable me to identify the exact reason for misclassification).
In basic probability terms, to calculate p(label|feature1,feature2), we have to multiply the probabilites to calculate the occurrence of feature 1 and feature 2 together. But in this case I am not trying to calculate a standard probability, rather the strength of positivity/negativity of the text. So if I sum up the probabilities, I get a number which can identify the positivity/negativity quotient. This is a bit unconventional but do you think this can give some good results. The reason is the sum and product can be quite different. E.g. 2*2 =4 but 3*1 = 3
The class-conditional probabilities P(feature|label) can be multiplied together if they are statistically independent. However, it's been found in practice that Naive Bayes still produces good results even for class-conditional probabilities that are not independent. Thus, you can compute the individual class-conditional probabilities P(feature|label) from simple counting and then multiply them together.
One thing to note is that in some applications, these probabilities can be extremely small, resulting in potential numerical underflow. Thus, you may want to add together the logs of the probabilities (rather than multiply the probabilities).
I understand if the features were different like what is the probability of a person being male if the height was 170 cm and weight 200 pounds. Then these probabilities have to be multiplied together as these conditions (events) occur together. But in case of text classification, this is not valid as it really doesn't matter if the events occur together.. E.g. the probability of a review being positive given the occurrence of word best is 0.1 and the probability of a review being positive given the occurrence of word polite is 0.05, then the probability of the review being positive given the occurrence of both words (best and polite) is not 0.1*0.05. A more indicative number would be the sum of the probabilities (needs to be normalized),

How do I know the confidence level of my correct probability score?

I have a writer recognition system that gives back an NLL (Negative Least Likelihood) score for a test sample against every trained model. For example if there are thirteen models to compare the sample against the NLL output will look like this.
15885.1881156907 17948.1931699086 17205.1548161452 16846.8936368077 20798.8048757930 18153.8179076007 18972.6746781821 17398.9047592641 19292.8326540969 22559.3178790489 17315.0994094185 19471.9518308519 18867.2297851016
Where each column represents the score for that sample against every model. Column 1 gives the score against model 1 and so on.
This test sample is written by model 1. So the first column should have the minimum value for correct prediction.
The output I provided here gives the desired prediction, as the value of column 1 is minimum.
When I presented my results I was asked how confident I was about the scores or the predicted values? I was asked to provide a confidence level of each score.
I did some reading after this and found some posts on 95 % confidence interval which appears as every result to my google query but it does not appear to be what I need.
The reason I need this is suppose for a test sample I have scores from 2 models. Then using the confidence level I am supposed to know which score to pick up.
For example for the same test sample the scores from another model are:
124494.535128967 129586.451168849 126269.733526396 129579.895935672 128582.387405272 125984.657455834 127486.755531507 125162.136816278 129790.811437270 135902.112799503 126599.346536290 136223.382395325 126182.202727967
Both are correctly predicting as in both cases score in column 1 is minimum. But again how do I find the confidence level of my score?
Would appreciate any guidance here.
As my knowledge you cannot evaluate a confidence level for just one value.
Suppose you can store your results in a matrix where each column corresponds to a model and each row corresponds to an example (or observation). You can evaluate the confidence for every single model by using all the predicted results from that model (i.e. you can evaluate the confidence interval for any column in our matrix) according to the following procedure:
Evaluate the mean value of the column, let's call this µ
Evaluate the standard deviation of the column, let's call this σ
Evaluate the mean error as ε=σ/sqrt(N), where N is the number of samples (rows)
the lower bound for the confidence interval is given by µ-2ε whereas the upper bound is given by µ+2ε. By straightforward subtraction you can find the amplitude of such confidence interval. The more is closer to zero, the more accurate is your measurement.
Hope this is what you're looking for.

iBeacon: Linear Approximation Model (LAM)

first i want to calibrate my Beacons, so for this i go 1 meter away and get 60 rssi values and take the average of them. Then I have the receiving signal power at 1m distance from my beacon.
Now I want to calculate the distance based on the following formula:
A represents the receiving signal power at 1 meter distance
K represents the exponent of the path loss
d represents the distance
K depends on the room, in which i want to calculate the distance. What is the best course of action to calculate the variable K for this solution?
Essentially, you need to solve for K and A. To do this, you need to repeat the calibration procedure for other distances to get more data points so you have multiple d values and multiple RSSI values. Then you need to run a regression to find the best fit values for K and A.
That said, I doubt you will have much success with this formula. I have not been able to use it to accurately predict distance. I have found this formula to be a better predictor.

Determine the position and value of peak

I have a graph with five major peaks. I'd like to find the position and value of the first peak (the one furthest to the right). I have more than 100 different plots of this and the peak grows and shrinks in size in the various plots, and will need to use a for loop. I'm just stuck on determining the x and y values to a large number of significant figures using Matlab code.
Here's one of the many plots:
If you know for sure you're always gonna have 5 peaks I think the FileExchange function extrema will be very helpful, see here.
This will return you the maxima (and minima if needed) in descending order, so the first elements of output zmax and imax are respectively the maximal value and its index, their second elements are the second maximum value and its index and so on.
In the case if the peak you need is always the smallest of the five you'll just need zmax(5) and imax(5) to determine the 5th biggest maximum.
If you have access to Signal Processing Toolbox, findpeaks is the function you are looking for. It can be invoked using different options including number of peaks, which can be helpful when that information is available.

What's the significance of a negative NCC coefficient w.r.t. image template matching?

I have been using Matlab's normxcorr2 function to do template matching with images by performing normalized cross correlation. To find the maximum correspondence between a template and an image, one can simply run normxcorr2 and then find the maximum absolute value of all the values returned by normxcorr2 (the function returns values between -1.0 and 1.0).
From a quick Google search, I found out that a negative correlation coefficient means an inverse relationship between two variables (e.g. as x increases, y decreases), and that a positive correlation coefficient means the opposite (e.g. as x increases, y increases). How does this apply to image template matching? That is, what does a negative value from normxcorr2 mean conceptually with respect to template matching?
View normalized cross correlation as a normalized vector dot product. If the angle between two vectors is zero, their dot product will be 1; if they are facing in the opposite direction, then their dot product with be negative 1. This is idea is actually direct if you take the array and stack the column end to end. The result is essentially a dot product between two vectors.
Also just as a personal anecdote: what confused me about template matching at first, was intuitively I believed element wise subtraction of two images would be a good metric for image similarity. When I first saw cross correlation, I wondered why it used element wise multiplication. Then I realized that the later operation is the same thing as a vector dot product. The vector dot product, as I mentioned before, indicates when two vectors are pointing in the same direction. In your case, the two vectors are normalized first; hence why the range is from -1 to 1. If you want to read more about the implementation, "Fast Normalized Cross Correlation" by J.P. Lewis is a classical paper on the subject.
Check the formula
on wikipedia.
When f(x, y) - mean(f) and t(x,y) - mean(t) have different sign the result of an addendum will be negative (std is always positive). If there are a lot of such (x,y) then the whole sum will also be negative. You may think that if 1.0 means that one image is equal to another. -1.0 means that one image is a negative of another (try to find normxcorr2(x, -x))