Minimizing Number of Hops When Passing Attendance Sheet in Class - dijkstra

My CS class has 7 rows and 11 columns of seats. I have noticed that whenever my teacher conducts attendance checks and passes one person a piece of paper to hand to everyone in my class, someone is usually required to get up and physically move to pass the paper. The usual seating positions of the students are
0,9
0,10
1,0
1,10
2,9
2,10
3,4
3,10
4,2
4,6
4,7
4,8
5,0
5,5
6,2
6,10
with 0,9 being the starting vertex where the professor begins to pass the attendance sheet. How would I minimize number of hops (i.e. number of times someone has to get up to pass sheet) between passing this sheet? This isn't a homework assignment or anything, I am just genuinely curious. Also, it is sometimes a bit annoying when people are getting up to pass the attendance sheet during the lecture.
I thought about using Dijkstra's algorithm to calculate this (the adjacency matrix implementation), but I am not entirely sure how I would calculate the weights between seats? Any consideration for this (ultimately) meaningless problem would be appreciated.

Related

Paired Samples t-test or Two-sample (heteroschedastic)?

My dataset consists of many patients who went through a treatment and I want to understand if there is a difference between before and after taking it. By far it appears clearly as a Paired Samples t-test, the problem is that I have different historical observations on each patient and let's say I have:
patient A with 150 observations before and 350 observations after,
patient B with 50 observations before and 300 observations after...
Which test should I use? The one for paired samples (even though n_before ≠ n_after)?
A most general Welch's t-test (they are heteroschedastic)?
Is there anything I am missing or doing wrong?
In case of paired samples should I need to use undersampling in order to make the lenghts match?
Thanks to everyone

Am I using PCA in Orange in a correct way?

I am analysing if 15 books can be grouped according to 6 variables (of the 15 books, 2 are written by an author, 6 by an other one, and 7 by an other one). I counted the number of occurrences of the variables and I calculated the percentage. Then I used Orange software to use PCA. I uploaded the file. selected the columns and rows. And when it comes to PCA the program asks me if I want to normalize the data or not, but I am not sure about that because I have already calculated the percentage - is normalize different from calculating the percentage? Moreover, below the normalize button it asks me to show only:... and I have to choose a number between 0 and 100 but I don’t really know what it is.
Could you help me understand what I should do? Thank you in advance

How to train an ANN to play a card game?

I would like to teach an ANN to play Hearts, but I am stuck on how to actually perform the training.
A friend suggested to use weka for the implementation of the actual ANN, but I've never used it, so I'm leaning towards a custom implementation.
I have programmed the rules and I can let the computer play a game, choosing random but legal cards each turn.
Now I am at a loss of what to send to the ANN as input and how to extract output (decreasing amount of cards each turn, so I can't let each output neuron be a possible card) and how to teach it and when to perform teaching.
My guess is to give the ANN as input:
The cards that have been played previously, with metadata of which player has played which card
The cards on the table for this turn, also with the same metadata
The cards in the ANN's hand
And then have the output be 13 neurons (the maximal amount of cards per player), of which I take the most activated of the cards that still are in the ANN's hand.
I also don't really know when to teach it (after each turn or after each game), as it is beneficial to have all the penalty cards, but bad to have all but one penalty card.
Any and all help is appreciated. I don't really know where else to put this question.
I currently have it programmed in Swift, but it's only 200 lines and I know a few other languages, so I can translate it.
Note that neural networks might not be the best thing to use here. More on that at the end of the answer, I'll answer your questions first.
Now I am at a loss of what to send to the ANN as input and how to extract output (decreasing amount of cards each turn, so I can't let each output neuron be a possible card) and how to teach it and when to perform teaching.
ANNs require labeled input data. This means a pair (X, y) where X can be whatever (structured) data related to your problem and y is the list of correct answers you expect the ANN to learn for X.
For example, think about how you would learn math in school. The teacher will do a couple of exercises on the blackboard, and you will write those down. This is your training data.
Then, the teacher will invite you to the blackboard to do one on your own. You might not do so well at first, but he/she will guide you in the right direction. This is the training part.
Then, you'll have to do problems on your own, hopefully having learnt how.
The thing is, even this trivial example is much too complex for an ANN. An ANN usually takes in real-valued numbers and outputs one or more real-valued numbers. So it's actually much dumber than a grade schooler who learns about ax + b = 0 type equations.
For your particular problem, it can be hard to see how it fits in this format. As a whole, it doesn't: you can't present the ANN with a game and have it learn the moves, that is much too complex. You need to present it with something for which you have a correct numerical label associated with and you want the ANN to learn the underlying pattern.
To do this, you should break your problem up into subproblems. For example, input the current player's cards and expect as output the correct move.
The cards that have been played previously, with metadata of which player has played which card
The ANN should only care about the current player. I would not use metadata or any other information that identifies the players.
Giving it a history could get complicated. You might want recurrent neural networks for that.
The cards on the table for this turn, also with the same metadata
Yes, but again, I wouldn't use metadata.
The cards in the ANN's hand
Also good.
Make sure you have as many input units as the MAXIMUM number of cards you want to input (2 x total possible cards, for the cards in hand and those on the table). This will be a binary vector where the ith position is true if the card corresponding to that position exists in hand / on the table.
Then do the same for moves: you will have m binary output units, where the ith will be true if the ANN thinks you should do move i, where there are m possible moves in total (pick the max if m depends on stages in the game).
Your training data will also have to be in this format. For simplicity, let's say there can be at most 2 cards in hand and 2 on the table, out of a total of 5 cards, and we can choose from 2 moves (say fold and all in). Then a possible training instance is:
Xi = 1 0 0 1 0 0 0 0 1 1 (meaning cards 1 and 4 in hand, cards 4 and 5 on table)
yi = 0 1 (meaning you should go all in in this case)
I also don't really know when to teach it (after each turn or after each game), as it is beneficial to have all the penalty cards, but bad to have all but one penalty card.
You should gather a lot of labeled training data in the format I described, train it on that, and then use it. You will need thousands or even tens of thousands of games to see good performance. Teaching it after each turn or game is unlikely to do well.
This will lead to very large neural networks. Another thing that you might try is to predict who will win given a current game configuration. This will significantly reduce the number of output units, making learning easier. For example, given the cards currently on the table and in hand, what is the probability that the current player will win? With enough training data, neural networks can attempt to learn these probabilities.
There are obvious shortcomings: the need for large training data sets. There is no memory of how the game has gone so far (unless you use much more advanced nets).
For games such as these, I suggest you read about reinforcement learning, or dedicated algorithms for your particular game. You're not going to have much luck teaching an ANN to play chess for example, and I doubt you will teaching it to play a card game.
First of all you need to create some good learning data set for training ANN. If your budget allows you can ask some cards professionals to share with you enough of their matches of how they played cards. Another way of generating data could be some bots, which play cards. Then you need to think how to represent data set of playing matches to neural network. Also I recommend you to represent cards not by their value (0.2, 0.3, 0.4, ..., 0.10, 0.11 (for jack), but as separated input. Also look for elastic neural networks which can be used for such task.

Shannon's Entropy measure in Decision Trees

Why is Shannon's Entropy measure used in Decision Tree branching?
Entropy(S) = - p(+)log( p(+) ) - p(-)log( p(-) )
I know it is a measure of the no. of bits needed to encode information; the more uniform the distribution, the more the entropy. But I don't see why it is so frequently applied in creating decision trees (choosing a branch point).
Because you want to ask the question that will give you the most information. The goal is to minimize the number of decisions/questions/branches in the tree, so you start with the question that will give you the most information and then use the following questions to fill in the details.
For the sake of decision trees, forget about the number of bits and just focus on the formula itself. Consider a binary (+/-) classification task where you have an equal number of + and - examples in your training data. Initially, the entropy will be 1 since p(+) = p(-) = 0.5. You want to split the data on an attribute that most decreases the entropy (i.e., makes the distribution of classes least random). If you choose an attribute, A1, that is completely unrelated to the classes, then the entropy will still be 1 after splitting the data by the values of A1, so there is no reduction in entropy. Now suppose another attribute, A2, perfectly separates the classes (e.g., the class is always + for A2="yes" and always - for A2="no". In this case, the entropy is zero, which is the ideal case.
In practical cases, attributes don't typically perfectly categorize the data (the entropy is greater than zero). So you choose the attribute that "best" categorizes the data (provides the greatest reduction in entropy). Once the data are separated in this manner, another attribute is selected for each of the branches from the first split in a similar manner to further reduce the entropy along that branch. This process is continued to construct the tree.
You seem to have an understanding of the math behind the method, but here is a simple example that might give you some intuition behind why this method is used: Imagine you are in a classroom that is occupied by 100 students. Each student is sitting at a desk, and the desks are organized such there are 10 rows and 10 columns. 1 out of the 100 students has a prize that you can have, but you must guess which student it is to get the prize. The catch is that everytime you guess, the prize is decremented in value. You could start by asking each student individually whether or not they have the prize. However, initially, you only have a 1/100 chance of guessing correctly, and it is likely that by the time you find the prize it will be worthless (think of every guess as a branch in your decision tree). Instead, you could ask broad questions that dramatically reduce the search space with each question. For example "Is the student somewhere in rows 1 though 5?" Whether the answer is "Yes" or "No" you have reduced the number of potential branches in your tree by half.

How to visualize gene networks and cluster groups of genes?

I'm working with biological data - namely groups of genes. For example:
group 1: geneA geneB geneC
group 2: geneD geneE
group 3: geneF geneG geneH
For each pair of genes, geneX and geneY I have a score telling how similiar the two genes are (actually, I have two scores, since I used BLAST which is 'directional': I first searched geneX against all the other genes then geneY against all the other genes, so I have two geneX--geneY scores, but I guess I can take the lower score of the two, or the average).
So, let's suppose I have only one score for each pair of genes. My data can be viewed as a undirected graph:
and recall each edge has a score attached to it.
Now, what I would like to do is:
Visualize my data interactively: being able to click on gene nodes
and open a link attached to them, show only edges above/below some threshold, control how the network is "spread", etc.
Cluster together groups which
are similar, i.e. groups that have
similar genes.
Any ideas of how can I do that? I guess it's basic clustering and I would appreciate any hints on packages/software that can be of any help here.
Thank you.
You'll probably get better responses if you ask this over at BioStar, the bioinformatics stackexchange.
Specifically, many of the answers in this thread might be relevant:
Which is the best software to represent biological pathways in a directed graph (network) ?
You can try cluto. You will have to transform your triples (gene_1, gene_2, similarity) into a matrix and use 'scluster'.