Recommendation System based on tags - recommendation-engine

I want to design a recommendation system that goes like this:
I have many restaurants with different tags
I have users making searches using those tags
I want to make recommendations based on the tags the users have search most
I am not looking for a complicated algorithm.
For example:
I have a user who has searched:
tag1 - n1 times
tag2 - n2 times
tag3 - n3 times
tag4 - n4 times
tag5 - n5 times
And there are 3 restaurants with the corresponding tags:
restaurant1: tag1, tag2, tag4, other_tag
restaurant2: tag5, other_tag
restaurant3: tag1, tag4, other_tag, other_tag
I was thinking about the following logic:
Let n = n1 + n2 + n3 + n4 + n5
Let t_i = the number of tags for the i_th restaurant
Then I'll compute:
R1 = sum(is_tag_i_in_restaurant1 * ni) / t_1, where i goes from 1 to 5
R2 = sum(is_tag_i_in_restaurant2 * ni) / t_2, where i goes from 1 to 5
R3 = sum(is_tag_i_in_restaurant3 * ni) / t_3, where i goes from 1 to 5
T1 = n / t_1
T2 = n / t_2
T3 = n / t_3
And now for each restaurant I will compare Ri with Ti. Let say, if Ri >= Ti/2 I will consider it a recommendation.
Is this a good way of doing this? Can you recommend me something more efficient?

There's scholarly research regarding tags at Google Scholar and http://grouplens.org/publications, in particular Jesse Vig, Shilad Sen, Tien Nguyen, and John Riedl's work. A lot of thought has gone into predicting tag preference as well. Check out the Tagommender paper. In general, users and items (restaurants or movies or other things) all have values for tags on some scale, such as preference or appropriateness. Then you find nearest neighbors using a similarity algorithm like cosine similarity.

Related

Nested double sort in Matlab

Suppose I have 3 vectors, vector A which is (n x 1), vector B which is (n x 1) and vector C which is (n x 1).
I want to sort the elements of A, into 5 groups, and then within those groups I want to sort the respective elements of B into 5 groups as well. And then take the average of the elements in C. So I will have 25 averages.
In other words:
Sort the elements of A into 5 quintiles;
Pick the first
group of elements in A, get the corresponding values in B;
Sort the picked elements of B into 5 groups.
Take the average of each group from C.
Pick the second group of elements in A, get the corresponding
values in B;
Sort the picked elements of B into 5 groups.
Take the average of each group from C.
And so on and so forth.
Here's my dummy code for this:
minimum = 50;
maximum = 100;
A = (maximum-minimum).*rand(1000,1) + minimum;
B = (maximum-minimum).*rand(1000,1) + minimum;
C = (maximum-minimum).*rand(1000,1) + minimum;
nbins1 = 5;
nbins2 = 5;
bins1 = ceil(nbins1 * tiedrank(A) / length(A));
for i=1:nbins1
B1 = B(bins1==i);
C1 = C(bins1==i);
bins2 = ceil(nbins1 * tiedrank(B1) / length(B1));
for j=1:nbins2
C2 = C1(bins2==j);
output(i,j) = mean(C2);
clearvars C2
end
clearvars B1 C1
end
The issue is that, this does not seem very elegant or efficient at all. Is there any other way of doing this? For people in Finance, this problem is analogous to the Fama-French (1993) double sorting of portfolios.
First of all, sort everything by column A:
sortedByA = sortrows([A,B,C], 1);
Create a dummy vector representing indices of each group in A (from 1 to nbins1):
groupsA = repmat(1:nbins1, 1000/nbins1, 1); groupsA = groupsA(:);
Then re-sort again (by first two columns), but replacing actual column A with group indices, which would in effect sort B within each group of values in A:
sorted = sortrows([groupsA, sortedByA(:,[2,3])], [1,2]);
Create indices for groups in column C (from 1 to nbins1*nbins2):
groupsC = repmat(1:(nbins1*nbins2), 1000/(nbins1*nbins2), 1); groupsC = groupsC(:);
Finally, compute mean within each group:
averages = accumarray(groupsC, sorted(:,3), [], #mean);

Matlab genetic algorithms in portfolio management

I would like to try genetic algorithms in portfolio management, but I don't now how the main function and constrains should look like.
I have matrix with stock prices, vector with weights and script that calculates portfolio price and portfolio return/risk(std) ratio. I want to use genetic algorithm in MATLAB so different combinations of wrights could be tested and optimal portfolio could be found (optimal - highest return/risk(std) ratio.
prices - matrix where columns represents different stocks and rows represents day prices.
w - vector with weights [0.333, 0.333, 0.333]
script that calculates portfolio performance:
d = length(prices);
n = numel(prices);
for j = 1:d
temp = 0;
for i = 1:n
temp = temp + prices(j,i) * w(i);
end
ap(j) = temp;
end
port_performance = rr_ratio(ap); %calculates return/risk(std) ratio.
I need to find best combination of weights, so port_performance would have maximum value. How GA function should look like, so sum(w) = 1; and each element of w >= 0?
Thank you
This is an extremely open ended question. There is no one perfect way to apply genetic algorithms to portfolio optimization. Generally, what you would do is something like the following:
Generate a large number of candidate portfolios at random, that satisfy your constraints.
Evaluate each portfolio according to your "fitness metric" which is presumably the risk/reward ratio.
Choose a subset of your portfolios to "reproduce" and kill the rest. Generally you do something like choosing the top 50% by performance.
"Breed" some new portfolios. You can do this by asexual reproduction (i.e. clone your old portfolios) or sexual reproduction (pick the old portfolios in pairs and combine them somehow to generate a new portfolio).
Introduce mutations into the portfolios with some small mutation rate (say p = 0.01). For example, you could randomly move some of the weights up/down, or randomly swap the weights for a couple of different stocks.
You now have a new population of portfolios, and you can start again.
To generate your random portfolios to begin with so that each w(i) >= 0 and sum(w) = 1 you could just do
>> w = rand(numPortfolios, numStocks);
>> w = bsxfun(#rdivide, w, sum(w,2));
Now each row of w is a candidate set of portfolio weights.
To breed two portfolios you could just take the average
>> wNew = 0.5 * (w1 + w2);
Or you could select elements at random from each portfolio and then renormalize to ensure that the weights sum to 1.
>> wNew = zeros(1, numStocks);
>> x = rand(1, numStocks) < 0.5;
>> wNew( x) = w1(x);
>> wNew(~x) = w2(x);
>> wNew = wNew / sum(wNew);
You might also consider taking a look at this paper.

matrix assignment from a matrix A to a matrix B using conditional statements based on a third matrix C

I have two questions if you can kindly respond:
Q1) I have a matrix choice, where each person is making 4 of any possible choices, denoted as 1, 2, 3 and 4.
I have three matrixes A1, A2, A3 with income information for each person and each time period. Say I have n people and t time periods so A1, A2, A3 are n-by-t and choice is n-by-t.
Now I want to make another matrix B, where B will pick the element from A according to the value in the choice matrix, i.e. if choice(n,t)==1, then B(n,t) = A1(n,t). If choice(n,t)==2, then B(n,t) = A2(n,t), and so on.
I have tried the for loop and the if statement, I am unable to do it. Please help.
Q2) I have a matrix A of incomes. A is dimension n-by-t. Some people have low income, some have high income. Say anyone with income<1000 is low and above 1000 is high. At the end of my simulations, I need to know whether each person was high income or low income. How can I make a high income and low income matrix from the bigger matrix?
Q1:
C = choice %else the code gets too long
B = A1 .* (C==1) + A2 .* (C==2) + A3 .* (C==3)
I'm not sure how you want to handle the value '4' in B if you only have A1 A2 A3, but this should work.
[EDIT]:
If the choice is '4', that element of B will be 0 for the B i defined above.
Q2:
this one's a little vague. Maybe this is what you wanted:
HighIncome = A > 1000
LowIncome = A <= 1000
If this doesn't do it, please explain your objective more precisely.
[EDIT]:
Based on your slightly less vague explanation on Q2 it sounds like you wan't something like this:
A_high_income = A .* (A > 1000)
A_low_income = A .* (A <= 1000)
CHOICE_high_income = choice .* (A > 1000)
CHOICE_high_income = choice .* (A <= 1000)
The high income matrices have zeros at the low-income positions and vice versa.
This doesn't make very much sens IMHO, but it's the closest I could get to your description.
If this doesn't do it, follow the instructions in my comment below and post some examples.
Q1: You can use three simple statements and some logical indexing.
B = A1;
B(choice == 2) = A2(choice == 2);
B(choice == 3) = A3(choice == 3);
Q2: To separate A and choice into two parts based on income, you first find the indices of "low income" rows and use that to get rows from the matrices.
lowIncomeNdx = any(A < 1000, 2);
lowIncome = A(lowIncomeNdx, :);
lowIncomeChoice = choice(lowIncomeNdx, :);
highIncome = A(~lowIncomeNdx, :);
highIncomeChoice = choice(~lowIncomeNdx, :);

Levenshtein Distance Formula in CoffeeScript?

I am trying to create or find a CoffeeScript implementation of the Levenshtein Distance formula, aka Edit Distance. Here is what I have so far, any help at all would be much appreciated.
levenshtein = (s1,s2) ->
n = s1.length
m = s2.length
if n < m
return levenshtein(s2, s1)
if not s1
return s2.length
previous_row = [s2.length + 1]
for c1, i in s1
current_row = [i + 1]
for c2, j in s2
insertions = previous_row[j + 1] + 1
deletions = current_row[j] + 1
substitutions = previous_row[j] # is this unnescessary?-> (c1 != c2)
current_row.push(Math.min(insertions,deletions,substitutions))
previous_row = current_row
return previous_row[previous_row.length-1]
#End Levenshetein Function
Btw: I know this code is wrong on many levels, I am happy to receive any and all constructive criticism. Just looking to improve, and figure out this formula!
CodeEdit1: Patched up the errors Trevor pointed out, current code above includes those changes
Update: The question I am asking is - how do we do Levenshtein in CoffeeScript?
Here is the 'steps' for the Levenshtein Distance Algorithm to help you see what I am trying to accomplish.
Steps
1
Set n to be the length of s.
Set m to be the length of t.
If n = 0, return m and exit.
If m = 0, return n and exit.
Construct a matrix containing 0..m rows and 0..n columns.
2
Initialize the first row to 0..n.
Initialize the first column to 0..m.
3 Examine each character of s (i from 1 to n).
4 Examine each character of t (j from 1 to m).
5 If s[i] equals t[j], the cost is 0.
If s[i] doesn't equal t[j], the cost is 1.
6 Set cell d[i,j] of the matrix equal to the minimum of:
a. The cell immediately above plus 1: d[i-1,j] + 1.
b. The cell immediately to the left plus 1: d[i,j-1] + 1.
c. The cell diagonally above and to the left plus the cost: d[i-1,j-1] + cost.
7 After the iteration steps (3, 4, 5, 6) are complete, the distance is found in cell d[n,m].
source:http://www.merriampark.com/ld.htm
This page (linked to from the resource you mentioned) offers a JavaScript implementation of the Levenshtein distance algorithm. Based on both that and the code you posted, here's my CoffeeScript version:
LD = (s, t) ->
n = s.length
m = t.length
return m if n is 0
return n if m is 0
d = []
d[i] = [] for i in [0..n]
d[i][0] = i for i in [0..n]
d[0][j] = j for j in [0..m]
for c1, i in s
for c2, j in t
cost = if c1 is c2 then 0 else 1
d[i+1][j+1] = Math.min d[i][j+1]+1, d[i+1][j]+1, d[i][j] + cost
d[n][m]
It seems to hold up to light testing, but let me know if there are any problems.

problem with arithmetic using logarthms to avoid numerical underflow

I have two lists of fractions;
say A = [ 1/212, 5/212, 3/212, ... ]
and B = [ 4/143, 7/143, 2/143, ... ].
If we define A' = a[0] * a[1] * a[2] * ... and B' = b[0] * b[1] * b[2] * ...
I want to calculate the values of A' / B',
My trouble is A are B are both quite long and each value is small so calculating the product causes numerical underflow very quickly...
I understand turning the product into a sum through logarithms can help me determine which of A' or B' is greater
ie max( log(a[0])+log(a[1])+..., log(b[0])+log(b[1])+... )
but i need the actual ratio....
My best bet to date is to keep the number representations as fractions, ie A = [ [1,212], [5,212], [3,212], ... ] and implement my own arithmetic but it's getting clumsy and I have a feeling there is a (simple) way of logarithms I'm just missing....
The numerators for A and B don't come from a sequence. They might as well be random for the purpose of this question. If it helps the denominators for all values in A are the same, as are all the denominators for B.
Any ideas most welcome!
Mat
You could calculate it in a slightly different order:
A' / B' = a[0] / b[0] * a[1] / b[1] * a[2] / b[2] * ...
If you want to keep it in logarithms, remember that A/B corresponds to log A - log B, so after you've summed the logarithms of A and B, you can find the ratio of the larger to the smaller by exponentiating your log base with max(logsumA, logsumB)-min(logsumA,logsumB).
Strip out the numerators and denominators since they are the same for the whole sequence. Compute the ratio of numerators element-by-element (rather as #Mark suggests), finally multiply the result by the right power of the denominator-of-B/denominator-of-A.
Or, if that threatens integer overflow in computing the product of the numerators or powers of the denominators, something like:
A'/B' = (numerator(A[0])/numerator(b[0]))*(denominator(B)/denominator(A) * ...
I've probably got some of the fractions upside-down, but I guess you can figure that out ?