I'm new to Tableau Desktop, so I'm guessing what I want to do is simple, but I don't know how to do it.
Basically, I have basketball data that gives me players total points scored over several seasons with different NBA teams. I'm trying to sort that data by team, based on the amount that each player scored for each specific team.
Right now, I have the data sorted by team, player, and the total number of points scored. The problem is - I don't actually want the total sum. (E.g. right now Shaq is listed first under the Celtics because he has the most career points out of anyone who played for the Celtics, but not for the Celtics themselves.)
Can someone tell me how I would go about sorting by sum points by team?
This is actually such a common issue that Tableau references a solution in their official training material. If my understanding of your requirement is correct, the following should solve the problem.
http://kb.tableau.com/articles/knowledgebase/finding-top-n-within-category
Related
I am trying to build a forecast for interest expense for floating debt in my company.
I have been given a set of ResetDates which help me match a given rate based on when the ResetDate is.
I have been successful in forecasting one period, but I need a much longer set of periods to satisfy my requirements.
I've tried derive nodes and nested if statements as well as filler nodes.
I am given this data to work with, I can only look at one ResetDate ahead.
Here you will find the data I used: Columns A/B/C/D is what i'm given, Column E (or 5th column from left to right) is what I want to derive as my output
I want to use 'InterestPayDate' and derive:
if it's more than 'NextReset' , the add 90 days to the 'NextReset' to create 'NextReset2'
That is as far as I can get.... where my problem lies is I want to look at NextReset2 and derive:
if 'InterestPayDate' is more than 'NextReset2', then add 90 days to 'NextReset2', if it's less than 'NextReset2', keep the current value for 'NextReset2'
Output should look like Column E here
Not sure if I need to dig deeper into the logical functions, in all honesty, I've just picked up SPSS and I am really trying to learn. Hopefully, you can point me in the right direction.
Thank you.
After computing the first NextReset2, you need to use a Filler node like the one below to change the value of the field.
You might need more than one identical nodes like this - one for each potential 90-day period that you are looking to extend the NextReset2 date. In your sample data, you will need at least two Filler nodes to get the correct value of NextReset2 for the last of the records.
There might be a more elegant way to do it, but this will work and it's easy enough to make copies of a node and string them together like this.
Please also see a sample IBM SPSS Modeler stream showing this approach here and using your sample data.
I am attempting to cluster a group of customers based on spend, order frequency, order breadth and what % of purchases they make in each category (there are around 20).
It will probably be a simple answer but I cannot figure out whether I should standardize (subtract mean and divide by sd) the % category buy columns or not. When I dont standardize I can get around 90% of the variance explained in 4-5 principal components (using SVD), but when I standardize each column I only get around 40% for the same number of principal components. My worry is that because each column is related, I am removing the relationship by standardizing. At the same time I am worried that not standardizing will cause issues with the other variables in the data that I have standardized.
I would assume if others tried clustering in this way they would face a similar issue but I cant seem to find one so it might be that I just dont understand the situation. Thanks for any clarification in advance!
Chris,
Percentage scale has a well defined range and nice properties.
By heuristically scaling these features you usually make things worse.
I work for a small non-profit organization and we are looking for a way to quickly tally scores and provide a few statistics for donors at our annual golf tournament. I thought this would be fairly easy, but I'm struggling to come up with a database schema to capture the scores. I can't figure out how the player's score relates to the specific hole on the course.
This is the diagram that I have so far. Am I way off base with this?
The Schema can be found here: https://app.quickdatabasediagrams.com/#/schema/forneGJp40inm7rWlf2Sbg
Perhaps make the Scores table a m:n join table between Players and Holes to capture each player's score on each hole. This is depicted on the diagram below. To get the score for a round you'd sum all scores for all holes with a specific CourseId, for a specific event.
I also denormalised it a little, adding a total score to the Rounds table. This means that you don't need to SUM() the individual scores every time to get the tallies for each Player's Round. That's just a suggestion for performance optimisation.
Source: https://app.quickdatabasediagrams.com/#/schema/x_amshIckkeGp8KAKEAmLQ
If it is possible to play the same course twice in the same event (the first and last matches could be on the same course, for example) then you should provide for that.
I have two other suggestions:
I think the relationship between Events and Venues is in the wrong direction.
I suggest splitting Players into two tables. One representing the human being, and the other representing the human's participation in the round. Perhaps "Person" and "Contestant" would be good names.
I'm struggling with this for the past week... I would like to build three reporters (so I can extract these info) of:
The duration of contacts between pairs of agents (i and j).
The gap between consecutive contacts between pairs of agents (i and j).
Number of contacts that an agent has.
If you can give a (small) push in the right direction, I would be grateful!
If I have interpreted this correctly, this is something I would probably do with links (though the table suggestion by #Alan may be quicker). Create a link between pairs of agents as they make contact and the link can have attributes such as duration, time (tick) of previous contact, maximum time between contacts, number of contacts.
The problem is that the number of ties is going to be N(N-1)/2 where N is number of agents. For large N, I suspect this would be fairly slow, at least to create the links. If you are expecting a dense network, with most agents contacting each other, then create all the links during setup and simply update the attributes. If a sparse network, with each agent contacting only a limited number of others, create the link at initial contact.
I'm working with biological data - namely groups of genes. For example:
group 1: geneA geneB geneC
group 2: geneD geneE
group 3: geneF geneG geneH
For each pair of genes, geneX and geneY I have a score telling how similiar the two genes are (actually, I have two scores, since I used BLAST which is 'directional': I first searched geneX against all the other genes then geneY against all the other genes, so I have two geneX--geneY scores, but I guess I can take the lower score of the two, or the average).
So, let's suppose I have only one score for each pair of genes. My data can be viewed as a undirected graph:
and recall each edge has a score attached to it.
Now, what I would like to do is:
Visualize my data interactively: being able to click on gene nodes
and open a link attached to them, show only edges above/below some threshold, control how the network is "spread", etc.
Cluster together groups which
are similar, i.e. groups that have
similar genes.
Any ideas of how can I do that? I guess it's basic clustering and I would appreciate any hints on packages/software that can be of any help here.
Thank you.
You'll probably get better responses if you ask this over at BioStar, the bioinformatics stackexchange.
Specifically, many of the answers in this thread might be relevant:
Which is the best software to represent biological pathways in a directed graph (network) ?
You can try cluto. You will have to transform your triples (gene_1, gene_2, similarity) into a matrix and use 'scluster'.