Schema for golf tournament scores - postgresql

I work for a small non-profit organization and we are looking for a way to quickly tally scores and provide a few statistics for donors at our annual golf tournament. I thought this would be fairly easy, but I'm struggling to come up with a database schema to capture the scores. I can't figure out how the player's score relates to the specific hole on the course.
This is the diagram that I have so far. Am I way off base with this?
The Schema can be found here: https://app.quickdatabasediagrams.com/#/schema/forneGJp40inm7rWlf2Sbg

Perhaps make the Scores table a m:n join table between Players and Holes to capture each player's score on each hole. This is depicted on the diagram below. To get the score for a round you'd sum all scores for all holes with a specific CourseId, for a specific event.
I also denormalised it a little, adding a total score to the Rounds table. This means that you don't need to SUM() the individual scores every time to get the tallies for each Player's Round. That's just a suggestion for performance optimisation.
Source: https://app.quickdatabasediagrams.com/#/schema/x_amshIckkeGp8KAKEAmLQ
If it is possible to play the same course twice in the same event (the first and last matches could be on the same course, for example) then you should provide for that.
I have two other suggestions:
I think the relationship between Events and Venues is in the wrong direction.
I suggest splitting Players into two tables. One representing the human being, and the other representing the human's participation in the round. Perhaps "Person" and "Contestant" would be good names.

Related

What is a suitable data mining model to find the best Hospital?

I have a Hospital ratings data-set and need to find best hospital when I just broke my leg. So what is the best data mining model that I can use and how to
find which model is better?
https://www.kaggle.com/center-for-medicare-and-medicaid/hospital-ratings#=
This is really up to you to design. You need to attach a weight to each of the variables you have, which is how you attach importance to that variable.
Is the hospital location a limiting factor? Maybe you can only hobble 5 miles on your broken leg, or maybe you're a baller and can book your private jet to Hollywood.
If you don't have a way to connect with an API to determine distance based on your location and the hospital address, then you'll just have to throw out location altogether.
If you just broke your leg, timeliness of care is probably pretty important. But if you want to get a boob job, then you probably don't mind waiting a month or two as long as it's done really well.
In this case, effectiveness of care is probably the most valuable variable. I would start with just that, then work on adding in more variables and refining your answer. What happens if two hospitals have equally good effectiveness? Then patient satisfaction might be the next most important, etc.

Sort by Section of Sum Field in Tableau

I'm new to Tableau Desktop, so I'm guessing what I want to do is simple, but I don't know how to do it.
Basically, I have basketball data that gives me players total points scored over several seasons with different NBA teams. I'm trying to sort that data by team, based on the amount that each player scored for each specific team.
Right now, I have the data sorted by team, player, and the total number of points scored. The problem is - I don't actually want the total sum. (E.g. right now Shaq is listed first under the Celtics because he has the most career points out of anyone who played for the Celtics, but not for the Celtics themselves.)
Can someone tell me how I would go about sorting by sum points by team?
This is actually such a common issue that Tableau references a solution in their official training material. If my understanding of your requirement is correct, the following should solve the problem.
http://kb.tableau.com/articles/knowledgebase/finding-top-n-within-category

Analyse abstract data

I need to process a lot of csv files that contains 3 columns: date, tv channel id, movie id.
Based on those columns, i need to classify what is the genre of each movie and the genre of tv channel id.
I'm new to big data process and i was wondering how can i classify that data if i only have an id (i can not use another source to search the id or generate random data to train my algorithm).
The solution that i found is define some range of hours and put the films that are on range inside some genre. Example:
movies that are played between 01:00-04:00, genre 1;
movies that are played between 04:01-06:00, genre 2;
etc.
After classify movies, i can classify the tv channels based on movies that they have played.
And i'm planning to do it using Spark :)
Anyone have another solution or any advice? It's kinda hard because those data looks like só abstract.
Thank you
When you say "I need to classify the genre of the movie", do you mean "Drama", "Comedy", "Action", or "Genre1", "Genre2"? I would suppose the second case in the following.
Don't assign a genre by hand - Use a clustering algorithm
First, I won't assign a genre based only on the time when the movie is played. Generally speaking, I would prevent you from doing the clustering by hand. As this is what clustering algorithms are made for. Those use features to group individuals that are, in a way, related one to each other.
In your case, there is a tricky part : each data point/row is not a movie. Thus, a movie might be present in different clusters, meaning having different genres.
There are several options :
Either a movie belons to different genres - which is quite natural.
You can choose only one genre based on the group where the movie appears the most frequently
If you decide to assign multiple genre per movie, you might think of a thresholds : for instance, if a movie appears less than N times in a group, then it does not belong to this group (unless it is the only groups it appears)
Create new features
You should design as much new features* as you can, helping the clustering algorithm to separate the data well and create homogeneous clusters.
As I can think of, you can do :
Add an boolean feature for each time frame you consider (0:00 - 3:59 ; 4:00 - 6:00 ; ... ). Only one of these features is one : when the movie is played. The others are null.
A feature counting how many times the movie has been played (Men in Black is more played than 12 Angry Men))
A feature couting how many channel ID have played this movie (Star Wars is played on more channel than some Bollywood movie)
...
Think of how a genre is represented / played throughout all channels and create the featuers accordingly.
PS: * Don't get me wrong, as much features means more than your three features but what's called the curse of dimensionality.

Complex SELECT in Tarantool

There is two spaces, named e.g. Company and Cars. Space Company has company id (primary index) and geolocation (point) fields (secondary index). Space Cars has car (primary index) and companies (array of all companies where this car can be rented). I need to get top 10 Companies in specified rectangle where specific car can be rented. What is the (if I can say so) best solution to achieving this?
Here I need to combine spatial and non-spatial indexes in order to get result. My search plan is to look for car tuple and get all companies (there may be 1000 of them), and then in another space to filter 10 of the within specified rectangle.
My use case is something similar to this (not rent-a-car use case), but all logic is same. There will be much more Companies than cars (millions of Companies and 300-500k of Cars). How to optimize my plan in order to get these infos, what indexes to use, etc.? There need to be spatial and non-spatial conditions for one select, as you see.
I think the best strategy for this type of index would be to map your cars to points in another dimension, sufficiently far apart from each other. E.g. if your typical search is within a few square kilometers, make sure each car "coordinate" is at least a few dozen kilometers away from the nearest neighbour car. Then you can use our multi-dimensional RTREE index for the search.

Accuracy of Recommendation for a User from Lenskit Recommender

I'm using the algorithm UserUserItemScorer is possible to obtain the accuracy of recommendation, ie, the quality score of the recommended item. The only way I found was the value of "score". Is there another way besides the "score" method?
[disclaimer: LensKit lead developer]
First, a terminology thing: in recommender systems, the score and the accuracy of the recommendation are very different things. The score is how relevant the recommender thinks the item is, and is the basis for doing recommendation; the accuracy of the recommendation is how well that score models the user's actual opinion of the item.
I'll move forward assuming that you're looking for ways to get the score for an item.
There are at least three ways:
Call score on ItemScorer for individual items. This is very slow for multiple items.
Call score on ItemScorer with a batch of items. This is usually much faster. However, if you got the items from an ItemRecommender, then you are probably repeating computations.
The ItemRecommender returns a list of ‘scored IDs’, which are item IDs associated with scores. The getScore() method on the item recommender will get the score for each item.
But in general, the item scorer's score is exactly how you get relevance estimates from LensKit. The scores returned by an ItemRecommender are usually just the scores provided by the underlying item scorer.