Assigning Final Scores to Identified Technologies: Considering Users' StackOverflow Reputation, badge counts, post scores, no.of posts, and post date - tags

I am trying to determine the importance of various factors in assigning a score for identified technologies using the user's StackOverflow post tags and content.
The considering factors are users' reputation scores, badge counts(gold, silver, bronze), post scores, number of posts, and post date. I want to give more weight to technologies identified from recent posts and give higher weight to users who have earned gold badges.
I have already identified technologies using post tags and titles, but I am having trouble assigning appropriate weights to each factor for calculating the final score for the technology expertise score. Can you suggest a way for me to assign a final score to identified technologies without assigning specific weights as I like to each factor?

Related

Schema for golf tournament scores

I work for a small non-profit organization and we are looking for a way to quickly tally scores and provide a few statistics for donors at our annual golf tournament. I thought this would be fairly easy, but I'm struggling to come up with a database schema to capture the scores. I can't figure out how the player's score relates to the specific hole on the course.
This is the diagram that I have so far. Am I way off base with this?
The Schema can be found here: https://app.quickdatabasediagrams.com/#/schema/forneGJp40inm7rWlf2Sbg
Perhaps make the Scores table a m:n join table between Players and Holes to capture each player's score on each hole. This is depicted on the diagram below. To get the score for a round you'd sum all scores for all holes with a specific CourseId, for a specific event.
I also denormalised it a little, adding a total score to the Rounds table. This means that you don't need to SUM() the individual scores every time to get the tallies for each Player's Round. That's just a suggestion for performance optimisation.
Source: https://app.quickdatabasediagrams.com/#/schema/x_amshIckkeGp8KAKEAmLQ
If it is possible to play the same course twice in the same event (the first and last matches could be on the same course, for example) then you should provide for that.
I have two other suggestions:
I think the relationship between Events and Venues is in the wrong direction.
I suggest splitting Players into two tables. One representing the human being, and the other representing the human's participation in the round. Perhaps "Person" and "Contestant" would be good names.

Reporter: Contact duration and gap times

I'm struggling with this for the past week... I would like to build three reporters (so I can extract these info) of:
The duration of contacts between pairs of agents (i and j).
The gap between consecutive contacts between pairs of agents (i and j).
Number of contacts that an agent has.
If you can give a (small) push in the right direction, I would be grateful!
If I have interpreted this correctly, this is something I would probably do with links (though the table suggestion by #Alan may be quicker). Create a link between pairs of agents as they make contact and the link can have attributes such as duration, time (tick) of previous contact, maximum time between contacts, number of contacts.
The problem is that the number of ties is going to be N(N-1)/2 where N is number of agents. For large N, I suspect this would be fairly slow, at least to create the links. If you are expecting a dense network, with most agents contacting each other, then create all the links during setup and simply update the attributes. If a sparse network, with each agent contacting only a limited number of others, create the link at initial contact.

Sort by Section of Sum Field in Tableau

I'm new to Tableau Desktop, so I'm guessing what I want to do is simple, but I don't know how to do it.
Basically, I have basketball data that gives me players total points scored over several seasons with different NBA teams. I'm trying to sort that data by team, based on the amount that each player scored for each specific team.
Right now, I have the data sorted by team, player, and the total number of points scored. The problem is - I don't actually want the total sum. (E.g. right now Shaq is listed first under the Celtics because he has the most career points out of anyone who played for the Celtics, but not for the Celtics themselves.)
Can someone tell me how I would go about sorting by sum points by team?
This is actually such a common issue that Tableau references a solution in their official training material. If my understanding of your requirement is correct, the following should solve the problem.
http://kb.tableau.com/articles/knowledgebase/finding-top-n-within-category

Accuracy of Recommendation for a User from Lenskit Recommender

I'm using the algorithm UserUserItemScorer is possible to obtain the accuracy of recommendation, ie, the quality score of the recommended item. The only way I found was the value of "score". Is there another way besides the "score" method?
[disclaimer: LensKit lead developer]
First, a terminology thing: in recommender systems, the score and the accuracy of the recommendation are very different things. The score is how relevant the recommender thinks the item is, and is the basis for doing recommendation; the accuracy of the recommendation is how well that score models the user's actual opinion of the item.
I'll move forward assuming that you're looking for ways to get the score for an item.
There are at least three ways:
Call score on ItemScorer for individual items. This is very slow for multiple items.
Call score on ItemScorer with a batch of items. This is usually much faster. However, if you got the items from an ItemRecommender, then you are probably repeating computations.
The ItemRecommender returns a list of ‘scored IDs’, which are item IDs associated with scores. The getScore() method on the item recommender will get the score for each item.
But in general, the item scorer's score is exactly how you get relevance estimates from LensKit. The scores returned by an ItemRecommender are usually just the scores provided by the underlying item scorer.

How to generate recommendation with matrix factorization

I've read some papers of Matrix Factorization(Latent Factor Model) in Recommendation System,and I can implement the algorithm.I can get the similar RMSE result like the paper said on the MovieLens dataset.
However I find out that,if I try to generate a top-K(e.g K=10) recommended movies list for every user by rank the predicted rating,it seems that the movies that are thought to be rated high point of all users are the same.
Is that just what it works or I've got something wrong?
This is a known problem in recommendation.
It is sometimes called "Harry Potter" effect - (almost) everybody likes Harry Potter.
So most automated procedures will find out which items are generally popular, and recommend those to the users.
You can either filter out very popular items, or multiply the predicted rating by a factor that is lower the more globally popular an item is.