Can an artificial neural network predict the outcome of sports games? [closed] - neural-network

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I was trying to find something original and fun to do with artificial neural networks (ANNs) as a personal/learning project and I though it would be cool if I could predict the results of sports games (especially NHL games).
I'm pretty sure it would be easy to evolve an ANN that can predict which team is most likely to win (usually the team with the better record). However, what I would like to do is create an ANN that would tell how likely the outcome is, similar to bookmaker odds.
Is this something an ANN can do? In the affirmative, what kind of success can I expect? I know I can't beat the bookmaker (at least not with a software solution). I want do this as a recreational project/challenge to myself. I don't expect to bet money on sports games with this project.

Way back in the days of the IBM XT I played with a shareware ANN program to try and improve my chances on the British football (soccer) pools. This is a form of betting where you try and predict which football matches will result in draws. I assigned each team a number then looked back thorough past results and from them generated a single digit for the result. From memory it was 0 from a home win , 1 for an away win and 2 for a draw. Each result went on a single line in a training file. I would then run the training file through the program and generate the ANN settings. I would then look up the following Saturdays matches and feed them into the ANN then look for matches predicted as draws.
As the weeks went on my predictions of draws did definetly become more and more accurate. However ...
1) The XT was so slow that by Christmas it was taking 24 hours to generate the ANN settings from the training data. I really had better things to do with my precious (and expensive) PC.
2) Although it was better at predicting draws it wasn't predicting enough to actually win any money. Looking back I suppose the program had just worked out that Manchester United would always beat Sheffield United. This was more football knowledge than I had but not enough to win any money.
3) Entering the results into the training data and then generating the forthcoming matches data was taking me ages and to be honest sport bores me rigid.
So I gave up and didn't become a millionaire.
These days however PC's are much faster and much of the training data could be scraped from the web. But I still doubt it is a route to a fortune but its certainly an interesting project.
Ian

A reply above stated:
I know that if the bookmakers odds could be beaten by an ANN,
bookmakers would already be using one to fix their odds.
Bookmakers don't set the line based on their analysis of the teams - they set it based on their analysis of the betting public's opinion of the teams. An ideal line for the bookie is where he has exactly the same amount bet on each side of the line - then he is guaranteed a profit = the 'juice' on the losers' bets. They move the line as game approaches to try to keep that 50/50 split. Bookie may think Home team -5 is accurate line based on game analysis, but if he expects that will draw 2x $$ on the Home team he will not set the line at -5 - he will set at -7 or -8 - to where he expects to draw equal $$ for both -5 and +5 bets.

ANNs are really good at pattern matching and prediction, so yes, odds are you could build an ANN that does what you want.
You'll need more than just team win/loss ratio to make it really effective however. Feed it stats for the players, too. For real effectiveness, try to include game-flow information... like which players are on the line for each play (for football, for example).
Ultimately, the biggest problem you'll run into (aside from the whole "writing the ANN" issue) is getting the data you need to feed it.

I've done some stock market predictions with an AI and my conclusion is that it is not very hard to make an AI that gets good results with the historical data.
Making winning transactions in the future is a different ballgame.

I have just worked on this very problem (predicting English Premier League games) for the past 10 days, and ended up with very similar results using 3 different methods: SVM, Logistic Regression, and NN.
LR and NN will give probabilities. SVM outputs 0/1 (but it can be tweaked for probas too (I haven't tried yet).
I needed a "massive" (by my standards at least) feature set though (almost 300) and a good chunk of data (13 years worth).
Re. data, I got it from the web, simply.
Conclusion: I can just about match the bookies in terms of accuracy (predicting victories in my case). If I add the pre-match odds to the feature set, I get the exact same accuracy as the bookies (as expected), but no better (surely meaning my feature set is summarized in the bookies odds, and they have a little extra knowledge on top).
I'm sure there is a way to get better accuracy, either by improving the algos, or more likely by having extremely granular data (as in which players play which games, for how many minutes, and a lot of player-level historical stats, so as to build bottom-up models of team performance).
But bottom line is I can testify NNs work quite well for that purpose. SVM is slightly better though, in my limited experience.

I think it's indeed all about data, but there's no end to what you could feed it with in order to be more accurate : winning/loosing streaks, players biorhythms, player's girlfriends mood before the game, minor/major injuries they suffered in the recent past, extra-sportive events that are bothering the players, etc, etc, etc.
But I don't think you can accurately predict which team is more likely to win, it would be just a more-or-less educated guess.

In my opinion and experience, because of the excessively large number of factors in play, designing and training the ANN will be unreasonably complex and time-consuming. ANNs are good at pattern matching, and game prediction takes much deductive reasoning rather than mere pattern matching.
But if you want to enjoy learning neural networks, it will be a good adventure. If you are successful, you might want to host your code somewhere for others to see and learn!
For game prediction, it would be much easier and faster with decision trees or a rules engine and so on. This will be no easy task either, but it will be another interesting activity.

My belief is that the unpredictability of an event is due to lack of information and understanding...If you have all the knowledge, then yes it could be done. Or, the more knowledge you have, the better it can be done.
So in theory, the answer is yes.
However, in practice, you can get a PhD and have a whole career working on this question and you still may not succeed.

Related

What to do with not enough training data?

I have a problem that I don't have enough training data for my NN. It is trying to predict the result of a soccer game given the last games which I woulf say is a regression task.
The training data are results of soccer games of the last 15 seasons (which are about 4500 games). Getting to new data would be hard and would take a lot of time.
What should I do now?
Is it good to duplicate the data?
Should I input randomized data? (Maybe noise but I'm not quite sure what that is)
If there is no way of creating more data,
I should probably turn up the learning rate right? (I have it sitting at 0.01 and the momentum at 0.9)
I am using mini batches consisting of 32 training datas in training. Since I don't have a lot of training I don't have a lot of mini batches. Should I stop using them?
To start from the beginning: This is a very theoretical question and is not directly related to programming, which I recommend (in future) to post over at the Data Science Stackexchange.
To go into your problem: 4500 samples is not as bad as it sounds, depending on the exact task at hand. Are you trying to predict the match results (i.e. which team is the winner?), are you looking for more specific predictions (across a lot of different, specific teams)?
If you can make sure that you have a reasonable amount of data per class, one can work with a number of samples lower than what you have. Simply duplicating the data will not help you much, since you are very likely to just overfit on the samples you are seeing, without much of an improvement; Or rather, you will get the same results as training over a longer period (since essentially you see every sample twice per epoch, instead of one).
Again, what usually happens after long training periods is overfitting, so nothing gained here.
Your second suggestion is generally called data augmentation. Instead of simply copying samples, you alter them enough to make it look "different" to the network. But be careful! Data augmentation works well for some inputs, like images, since the change in input is significant enough to not represent the same sample, but still contains meaningful information about the class (a horizontally mirrored image of a cat still shows a "valid cat", unlike a vertically mirrored image, which is more unrealistic in the real world).
Essentially, it depends on your input features to determine where it makes sense to add noise. If you are only changing the results of the previous game, a minor change in input (adding/subtracting one goal at random) can significantly change the prediction you make.
If you slightly scramble ELO scores by a random number, on the other hand, the input value will not be too different, "but different enough" to use it as a novel example.
Turning up the learning rate is not a good idea, since you are essentially just letting the network converge more towards the specific samples. On the contrary, I would argue that the current learning rate is still too high, and you should certainly not increase it.
Regarding mini batches, I think I have referenced this a million times now, but always consider smaller minibatches. From a theoretical point of view, you are more likely to converge to a local minimum.

Newbie to Neural Networks

Just starting to play around with Neural Networks for fun after playing with some basic linear regression. I am an English teacher so don't have a math background and trying to read a book on this stuff is way over my head. I thought this would be a better avenue to get some basic questions answered (even though I suspect there is no easy answer). Just looking for some general guidance put in layman's terms. I am using a trial version of an Excel Add-In called NEURO XL. I apologize if these questions are too "elementary."
My first project is related to predicting a student's Verbal score on the SAT based on a number of test scores, GPA, practice exam scores, etc. as well as some qualitative data (gender: M=1, F=0; took SAT prep class: Y=1, N=0; plays varsity sports: Y=1, N=0).
In total, I have 21 variables that I would like to feed into the network, with the output being the actual score (200-800).
I have 9000 records of data spanning many years/students. Here are my questions:
How many records of the 9000 should I use to train the network?
1a. Should I completely randomize the selection of this training data or be more involved and make sure I include a variety of output scores and a wide range of each of the input variables?
If I split the data into an even number, say 9x1000 (or however many) and created a network for each one, then tested the results of each of these 9 on the other 8 sets to see which had the lowest MSE across the samples, would this be a valid way to "choose" the best network if I wanted to predict the scores for my incoming students (not included in this data at all)?
Since the scores on the tests that I am using as inputs vary in scale (some are on 1-100, and others 1-20 for example), should I normalize all of the inputs to their respective z-scores? When is this recommended vs not recommended?
I am predicting the actual score, but in reality, I'm NOT that concerned about the exact score but more of a range. Would my network be more accurate if I grouped the output scores into buckets and then tried to predict this number instead of the actual score?
E.g.
750-800 = 10
700-740 = 9
etc.
Is there any benefit to doing this or should I just go ahead and try to predict the exact score?
What if ALL I cared about was whether or not the score was above or below 600. Would I then just make the output 0(below 600) or 1(above 600)?
5a. I read somewhere that it's not good to use 0 and 1, but instead 0.1 and 0.9 - why is that?
5b. What about -1(below 600), 0(exactly 600), 1(above 600), would this work?
5c. Would the network always output -1, 0, 1 - or would it output fractions that I would then have to roundup or rounddown to finalize the prediction?
Once I have found the "best" network from Question #3, would I then play around with the different parameters (number of epochs, number of neurons in hidden layer, momentum, learning rate, etc.) to optimize this further?
6a. What about the Activation Function? Will Log-sigmoid do the trick or should I try the other options my software has as well (threshold, hyperbolic tangent, zero-based log-sigmoid).
6b. What is the difference between log-sigmoid and zero-based log-sigmoid?
Thanks!
First a little bit of meta content about the question itself (and not about the answers to your questions).
I have to laugh a little that you say 'I apologize if these questions are too "elementary."' and then proceed to ask the single most thorough and well thought out question I've seen as someone's first post on SO.
I wouldn't be too worried that you'll have people looking down their noses at you for asking this stuff.
This is a pretty big question in terms of the depth and range of knowledge required, especially the statistical knowledge needed and familiarity with Neural Networks.
You may want to try breaking this up into several questions distributed across the different StackExchange sites.
Off the top of my head, some of it definitely belongs on the statistics StackExchange, Cross Validated: https://stats.stackexchange.com/
You might also want to try out https://datascience.stackexchange.com/ , a beta site specifically targeting machine learning and related areas.
That said, there is some of this that I think I can help to answer.
Anything I haven't answered is something I don't feel qualified to help you with.
Question 1
How many records of the 9000 should I use to train the network? 1a. Should I completely randomize the selection of this training data or be more involved and make sure I include a variety of output scores and a wide range of each of the input variables?
Randomizing the selection of training data is probably not a good idea.
Keep in mind that truly random data includes clusters.
A random selection of students could happen to consist solely of those who scored above a 30 on the ACT exams, which could potentially result in a bias in your result.
Likewise, if you only select students whose SAT scores were below 700, the classifier you build won't have any capacity to distinguish between a student expected to score 720 and a student expected to score 780 -- they'll look the same to the classifier because it was trained without the relevant information.
You want to ensure a representative sample of your different inputs and your different outputs.
Because you're dealing with input variables that may be correlated, you shouldn't try to do anything too complex in selecting this data, or you could mistakenly introduce another bias in your inputs.
Namely, you don't want to select a training data set that consists largely of outliers.
I would recommend trying to ensure that your inputs cover all possible values for all of the variables you are observing, and all possible results for the output (the SAT scores), without constraining how these requirements are satisfied.
I'm sure there are algorithms out there designed to do exactly this, but I don't know them myself -- possibly a good question in and of itself for Cross Validated.
Question 3
Since the scores on the tests that I am using as inputs vary in scale (some are on 1-100, and others 1-20 for example), should I normalize all of the inputs to their respective z-scores? When is this recommended vs not recommended?
My understanding is that this is not recommended as the input to a Nerual Network, but I may be wrong.
The convergence of the network should handle this for you.
Every node in the network will assign a weight to its inputs, multiply them by their weights, and sum those products as a core part of its computation.
That means that every node in the network is searching for some coefficients for each of their inputs.
To do this, all inputs will be converted to numeric values -- so conditions like gender will be translated into "0=MALE,1=FEMALE" or something similar.
For example, a node's metric might look like this at a given point in time:
2*ACT_SCORE + 0*GENDER + (-5)*VARISTY_SPORTS ...
The coefficients for each values are exactly what the network is searching for as it converges.
If you change the scale of a value, like ACT_SCORE, you just change the scale of the coefficient that will be found by the reciporical of that scaling factor.
The result should still be the same.
There are other concerns in terms of accuracy (computers have limited capacity to represent small fractions) and speed that may enter this, but not being familiar with NEURO XL, I can't say whether or not they apply for this technology.
Question 4
I am predicting the actual score, but in reality, I'm NOT that concerned about the exact score but more of a range. Would my network be more accurate if I grouped the output scores into buckets and then tried to predict this number instead of the actual score?
This will reduce accuracy, although you should converge to a solution much faster with fewer possible outputs (scores).
Neural Networks actually describe very high-dimensional functions in their input variables.
If you reduce the granularity of that function's output space, you essentially state that you don't care about local minima and maxima in that function, especially around the borders between your output scores.
As a result, you are sacrificing information that may be an essential component of the "true" function that you are searching for.
I hope this has been helpful, but you really should break this question down into its many components and ask them separately on different sites -- potentially some of them do belong here on StackOverflow as well.

Making predictions from a CV

I have a database with many CVs, including structured data of the gender, age, address, number of years of education, and many other parameters of each person.
For about 10% of the sample, I also have additional data about a certain action they've made at some point in time. For instance, that Jane took a home loan in July 1998 or that John started pilot training in Jan. 2007 and got his license in Dec. 2007.
I need an algorithm that will give, for each of the actions, the probability that it will happen for each person in future time increments. For instance, that the chance of Bill taking a home loan is 2% in 2011, 3.5% in 2012, etc.
How should I approach this? Regression analysis? SVM? Neural net? Something else?
Is there perhaps even some standard tool/library that I can use with just the obvious customizations?
The probability that X happens given that Y happened is right out of Bayesian inference, I think.
Lou is right, this is the case for 'Bayesian Inference'.
The best tool/library to solve this is the R statistic programming language (r-project.org).
Take a look at the Bayesian Inference Libraries in R:
http://cran.r-project.org/web/views/Bayesian.html
How many people are in the "10% of the sample"? If it's below 100 people or so, I would fear that the results of the analysis could not be significant. If it's 1000 or more people, the results will be quite good (rule of thumb).
I would fist export the data to R (r-project) and do some data cleaning necessary. Then find a person familiar with R and advanced statistics, he will be able to solve this very quickly. Or try yourself, but R takes some time in the beginning.
Concerning the tool/library choice, I suggest you give Weka a try. It's an open source tool for experimenting with data mining and machine learning. Weka has several tools for reading, processing and filtering your data, as well as prediction and classification tools.
However, you must have a strong foundation in the above mentioned fields in order to strive for a useful result.

Is LOC correct parameter for project estimation? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Is LOC correct parameter for project estimation?
there are so many scenarios where complexity takes much more time for a single line of code,
other than LOC what could be the suggested parameter for project estimation?
As peoples are talking about functional point of program does it mean for use case related information?
i am trying to find out any solid base for full software developement estimation which can consist analysis, design, testcase preparation, and coding, please suggest?
Steve McConnell in Rapid Development (Microsoft Press, 1996):
Because different programming
languages produce such different bangs
for a given number of lines of code,
much of the software industry is
moving toward a measure called
"function points" to estimate program
sizes. A function point is a synthetic
measure of program size that is based
on a weighted sum of the number of
inputs, outputs, inquiries, and files.
Function points are useful because
they allow you to think about program
size in a languageindependent way.
Google "Function Point" for more information.
Seeing as developers are likely to* spend most of their time trying to test changes, lines-of-code is never a good indicator of size of a problem.
Let's suppose you have an existing large application - changing a single line of code may seem trivial, but the test planning and execution could take weeks.
Likewise, adding a relatively large amount of code in a single limited-scope module which is easily testable might be only a few days.
* they should do, at least. If they're spending more time writing code than testing it, it is probably full of bugs. And I mean BEFORE it reaches your dedicated QA team.
Only if you use it in the inverse.
-- Edit
But no. It isn't. It's a mostly useless measure, and generally harmful. As you note, less code is almost always better.
Other things to check? Well, what are you trying to measure? What result do you want to see from a change in the things that you would be checking? What sort of decisions will you be making on the basis of these changes?
LOC is one proxy measure for measuring the problem size.
LOC estimate can be used, and LOC count is relatively cheap to measure from historical projects. But LOC can be problematic if used for anything else than a proxy for problem size, as already pointed out by other answers.
Problem size is rather constant given the requirements. From a size estimate you can go to effort, schedule and cost estimates. It depends on your planning drivers such as cost or schedule. From the historical data you can find correlation how problem size translates to effort and how other planning drivers further influence the outcome. So you need to measure size measure and effort vs. other parameters and keep on fine-tuning your estimation process. There are some LOC-to-effort measures available in the literature, but they are not very accurate in your domain, using the technology you are using, and the team you have.
Other proxies for problem size are function points and story points. My experience on function points is that they are rarely worth the effort. On the other hand, story points in agile methods work very well since they are deliberately abstract (thus avoiding a lot of problems with with LOC) and measured on a sprint-by-sprint basis, with instant feedback into following sprints.
No, it isn't. The reason is simple: if you produce a new line of code during your development, are you one step closer to a solution? If you estimated 1000 lines of code to complete a task, are you now 0.1% complete with that task?
Lines of code can be used as a metric but only in the negative sense: for a greater number of lines of code, it is reasonable to assume that you have a greater number of bugs. Based on historical data, there is generally a linear correlation between lines of code and bug count.
Here are some useful and measurable factors that are worth considering:
Hours of labor.
Dollars spent: this is a good one because it strongly enforces the reality that you'd rather find bugs at the developer's desktop than in the hands of a tester or customer).
Milestones met: is the system available for the customers on the right date?
Requirements completed: this can be a funny one - what if you discover a new customer need during the project?
In short, lines of code is very nearly the worst possible metric you could ever use.
The only way to get any reasonable estimate on project duration is to COMPLETELY implement and deliver some subset of the final requirements. Then you can estimate the remaining requirements by comparing their complexity against the completed work.

How do I estimate tasks using function points?

What are the steps to estimating using function points?
Is there a quick-reference guide of some sort out there?
I took a conference session on Function Point Analysis a few years back. There is a lot too it. You can check out the Free Function Point Training Manual online, the Fundamentals of Function Points, or I suspect you can get a book on it at a computer store.
You might also check out the International Function Point Users Group and see if they have some resources or a local meeting for you.
You really need to get some training on it. Check with IFPUG. You will unknowingly pick up some destructive bad habits if self-taught. It also helps to have an experienced FP analyst review some of your early attempts.
It's the kind of thing that appears overwhelmingly complex until you "get it" and then it's fairly quick to do. It improved my requirements analysis a lot too. I often spot contradictions and gaps when doing a count.
It isn't limited to BDUF Waterfall projects either. I spent three years using FP and Planning Poker as cross-checks on one another when contracting agile methods projects.
I was IFPUG-certified from 2002-2005 and am still using FP analysis. I've seen it misused a lot, and I think that's why it has such a bad reputation.
I recommend you take a look at COSMIC Function points. https://cosmic-sizing.org. COSMIC Function points are also an ISO standard for measuring software size. They are an evolved improvement over IFPUG.
You can quickly estimate size by counting the entries, exits, reads and writes.
Compared with the IFPUG manual, learning COSMIC is much easier, the free book below is all you need, and you can read it in a day.
Recommended reading: https://cosmic-sizing.org/publications/measurement-guide/