How to add new item with rating in ALS? - pyspark

I am using collaborative filtering with ALS (SparkML) and I want to recommend similar items to a new item that is coming with user rating. Can I transform the new item using the factor representation of ALS and use cosine similarity to suggest similar items?

Related

Multiple object tracking using radar data and extended kalman filter

thanks in advance.
I am new to the multiple object tracking field. So, I have been working on this for a couple of days. I have developed my first version of a single object tracker using an extended Kalman filter. I am estimating position, velocity by assuming a constant acceleration model. Now my question is how can I convert the existing model for multiple objects tracking. The main problem is I am using radar data. So, I am not able to get the references for developing the tracker. So, One good example or steps to achieve can help me in understanding the concept.
The answer to this question depends on a lot of things. For example, how much control and knowledge do you have over the whole system? If you know how many targets you need to track you can add all of them to the Kalman Filter state and for every measurement you perform data association to find out to which object a given measurement belongs. An easy association metric would be nearest neighbor.
If you don't know how many targets there will be you will want to implement a track management where each target you are tracking represents a track and you can model birth and death probabilities of targets.
Multi Target Tracking is a vast field and if you want to have an in-depth mathematical introduction I would recommend the 2015 survey paper "Multitarget Tracking" by Ba-Ngu Vo et al. You should be able to find a preprint pdf online.
If you are looking more for a lightweight tutorial I would assume it should be possible to find some tutorial or example code online where to start. As mentioned in the first paragraph, nearest neighbor association for a fixed amount of objects might be a good first step.

Sum Filtered Data in Tableau

I have a database of users and each user record has "User ID" and "Group". After filtering out a chunk of the records, I'd like to sum the number of users within each group. Currently I am doing that with the calculation:
{FIXED[Group]:SUM([Number of Records])}
The problem here is this calculation appears to ignore any records that I've filtered out and just gives a total count per group from all of the unfiltered data.
Is there a quick way to sum the number of visible users in each group after applying a filter?
The easiest way of solving this would be to take advantage of the order of operations in Tableau.
The issue you are having at the moment is the LOD calculation is performed prior to a dimension filter.
If you want to calculate a field at a different level of detail then the view than a LOD is still the way to go. All you need to do is force tableau to apply the filters before calculating the fixed calculation.
In order to do this change your filters to a context filter. This is done by right clicking on the filter and selecting "Add to context. You will see the filter change from blue to grey.
Your calculated field should now be sensitive to any context filters.
Find out more here

Matrix factorization based recommendation for like/dislike/unknown data

Most literature focus on either explicit rating data or implicit (like/unknown) data. Are there any good publications to handle like/dislike/unknown data? That is, in the data matrix there are three values, and I'd like to recommend from unknown entries.
And are there any good open source implementations on this?
Thanks.
With like and dislike, you already have explicit rating data. You can use standard collaborative filtering with user and item normalization. You can also check out OrdRec: An Ordinal Model for Predicting Personalized Item Rating Distributions, which just takes an ordinal ranking of item ratings. That is, you can say that Like is better than Dislike, and let the algorithm figure out the best ranking-to-rating mapping before doing standard item-item collaborative filtering. Download LensKit and use the included OrdRec algorithm.

Accuracy of Recommendation for a User from Lenskit Recommender

I'm using the algorithm UserUserItemScorer is possible to obtain the accuracy of recommendation, ie, the quality score of the recommended item. The only way I found was the value of "score". Is there another way besides the "score" method?
[disclaimer: LensKit lead developer]
First, a terminology thing: in recommender systems, the score and the accuracy of the recommendation are very different things. The score is how relevant the recommender thinks the item is, and is the basis for doing recommendation; the accuracy of the recommendation is how well that score models the user's actual opinion of the item.
I'll move forward assuming that you're looking for ways to get the score for an item.
There are at least three ways:
Call score on ItemScorer for individual items. This is very slow for multiple items.
Call score on ItemScorer with a batch of items. This is usually much faster. However, if you got the items from an ItemRecommender, then you are probably repeating computations.
The ItemRecommender returns a list of ‘scored IDs’, which are item IDs associated with scores. The getScore() method on the item recommender will get the score for each item.
But in general, the item scorer's score is exactly how you get relevance estimates from LensKit. The scores returned by an ItemRecommender are usually just the scores provided by the underlying item scorer.

Mahout Log Likelihood similarity metric behaviour

The problem I'm trying to solve is finding the right similarity metric, rescorer heuristic and filtration level for my data. (I'm using 'filtration level' to mean the amount of ratings that a user or item must have associated with it to make it into the production database).
Setup
I'm using mahout's taste collaborative filtering framework. My data comes in the form of triplets where an item's rating are contained in the set {1,2,3,4,5}. I'm using an itemBased recommender atop a logLikelihood similarity metric. I filter out users who rate fewer than 20 items from the production dataset. RMSE looks good (1.17ish) and there is no data capping going on, but there is an odd behavior that is undesireable and borders on error-like.
Question
First Call -- Generate a 'top items' list with no info from the user. To do this I use, what I call, a Centered Sum:
for i in items
for r in i's ratings
sum += r - center
where center = (5+1)/2 , if you allow ratings in the scale of 1 to 5 for example
I use a centered sum instead of average ratings to generate a top items list mainly because I want the number of ratings that an item has received to factor into the ranking.
Second Call -- I ask for 9 similar items to each of the top items returned in the first call. For each top item I asked for similar items for, 7 out of 9 of the similar items returned are the same (as the similar items set returned for the other top items)!
Is it about time to try some rescoring? Maybe multiplying the similarity of two games by (number of co-rated items)/x, where x is tuned (around 50 or something to begin with).
Thanks in advance fellas
You are asking for 50 items similar to some item X. Then you look for 9 similar items for each of those 50. And most of them are the same. Why is that surprising? Similar items ought to be similar to the same other items.
What's a "centered" sum? ranking by sum rather than average still gives you a relatively similar output if the number of items in the sum for each calculation is roughly similar.
What problem are you trying to solve? Because none of this seems to have a bearing on the recommender system you describe that you're using and works. Log-likelihood similarity is not even based on ratings.