Designing a chatbot to ask the right questions in order to guess an object - chatbot

I am trying to design a chatbot to ask minimal amount of questions to guess an object a human user in mind
Lets suppose I have a database as such
Q1 Q2 Q3 Q4 Label
1 0 0 1 Apple
0 0 0 1 Apple
0 1 1 1 Mango
Where Q1 is "Is it red in color", Q2 is "Is it soft?", Q4 is "Is it sweet". There can be thousands possible questions, and thousands label possible in real database.
The chatbot will ask a question
A user will answer yes or no.
A user starts this game by having apple in mind
Chatbot asked the first question "Is it red"?
Human: yes
Chatbot calculate the probability so far. Apple 60%, Mango 0%
Since apple now has the highest probability, it will ask a question that help to see if it is really apple, "Is it sweet"?
Human: yes
Chatbot update the probability. Apple 90%. Mango 40%
Once the probability is high enough, the chatbot will stop and spit out all the probability it has
First question: I need to get the probability of each label with only a few questions answered. Say after I ask the first question "Is it red". Chatbot needs to know probability of each item given what we know so far
60% = Prob(Apple, [1 ? ? ?])
I am not quite sure how to feed in a feature vector with unknowns to the model and ask it to predict
Second question: I want to ask least amount of questions, so ideally I would like to ask a question that can really differentiate. How do I determine which next question to ask.
Seems like the traditional machine learning model does not work. How would you design a system like this?

I found the solution to the problem. The answer is naive bayes.
The math can be found here
x is a feature
y is the object in question
while (True):
# sort the object based on probability
# print out the objects with their probability
# top_object = object with highest probability
# Get the x such that P(x=1|top_object) is the highest, and ask "do that object have x"
# Wait for input from user
# Then for each object, update probability prob = prob * P(x=input|y)
# remove x from possible feature can ask
# if no more feature left, quit

Related

Netlogo: Built-in function to calculate the expected profit

Sorry for long post. I am newbie in agent-based modelling. So please accept my apology in advance if my question sounds stupid. I am trying to model a scenario where framer (i.e. agent) decides which type of crop should be harvest in different types of fields to increase the profit. The farmer agent has a budget i.e. the amount of money that can be spent on farming each time step equal to $100.
The farmer operates a farm that is subdivided into nine fields, which are arranged in a 3x3
cellular grid. Each field is of the same size. Water availability varies spatially across the fields with a rating of either 1 (driest), 2 (moderate),
or 3 (wettest). The manner in which water availability varies across the fields (i.e. randomly).
The farmer must choose among three crops. As initial parameter settings, the crops have the
following characteristics:
Yield Price Costs Minimum Water Req.
Crop 1 300 20 15 3
Crop 2 200 12 10 2
Crop 3 100 7 5 1
Each crop requires a certain amount of water to grow. Crop yields will only be realized if the crop is
planted in a field with at least the crop’s minimum water requirement.
Now the problem is that I couldn't find any function in Netlogo that calculates the permutation or combination of crop, field, and water requirements to calculate the expected profit. Any help would be high appreciated.
I believe you describe a linear programming problem.
Useful functions for solving Simplex Linear Programming problems are in NumAnal extension, which does not come bundled with NetLogo but which you can get as follows:
In NetLogo, under Tools / Extensions ... you can find NumAnal, probably with no green check-mark. Select it. On the right, you have buttons to install it, and then one to add it to your code. When you click those, it should now get a green checkmark and you should have a new line in your code "extensions [ numanal ]", and you are now able to use those commands, with the "numanal:" prefix, for example, numanal:simplex.
The documentation for it is in the folder where it was installed. But where is that?
Sadly, the documentation for where extensions are downloaded is not current.
https://ccl.northwestern.edu/netlogo/docs/extensions.html#where-extensions-are-located
After exhaustive search by date-modified, I actually found the folder on my Windows 10 laptop here: c:\Users\condor\AppData\Roaming\NetLogo\6.1\extensions
( Note the "\Roaming\" ).
That folder has a README.md text file, and a pdf document named "NumAnal-v3.4.0" explaining how to use it, and an examples folder with code. It is a little dense.
Here's a link to the basics of how to describe a Linear Programming problem, which is beyond the scope of StackOverflow. You can find help via Google.
Here's one 8 minute video ( as of 24-Nov-2019) that might help you figure out if this is what you need.
Simplex Algorithm Explanation (How to Solve a Linear Program)
https://www.youtube.com/watch?v=RO5477EKlXE

How to train an ANN to play a card game?

I would like to teach an ANN to play Hearts, but I am stuck on how to actually perform the training.
A friend suggested to use weka for the implementation of the actual ANN, but I've never used it, so I'm leaning towards a custom implementation.
I have programmed the rules and I can let the computer play a game, choosing random but legal cards each turn.
Now I am at a loss of what to send to the ANN as input and how to extract output (decreasing amount of cards each turn, so I can't let each output neuron be a possible card) and how to teach it and when to perform teaching.
My guess is to give the ANN as input:
The cards that have been played previously, with metadata of which player has played which card
The cards on the table for this turn, also with the same metadata
The cards in the ANN's hand
And then have the output be 13 neurons (the maximal amount of cards per player), of which I take the most activated of the cards that still are in the ANN's hand.
I also don't really know when to teach it (after each turn or after each game), as it is beneficial to have all the penalty cards, but bad to have all but one penalty card.
Any and all help is appreciated. I don't really know where else to put this question.
I currently have it programmed in Swift, but it's only 200 lines and I know a few other languages, so I can translate it.
Note that neural networks might not be the best thing to use here. More on that at the end of the answer, I'll answer your questions first.
Now I am at a loss of what to send to the ANN as input and how to extract output (decreasing amount of cards each turn, so I can't let each output neuron be a possible card) and how to teach it and when to perform teaching.
ANNs require labeled input data. This means a pair (X, y) where X can be whatever (structured) data related to your problem and y is the list of correct answers you expect the ANN to learn for X.
For example, think about how you would learn math in school. The teacher will do a couple of exercises on the blackboard, and you will write those down. This is your training data.
Then, the teacher will invite you to the blackboard to do one on your own. You might not do so well at first, but he/she will guide you in the right direction. This is the training part.
Then, you'll have to do problems on your own, hopefully having learnt how.
The thing is, even this trivial example is much too complex for an ANN. An ANN usually takes in real-valued numbers and outputs one or more real-valued numbers. So it's actually much dumber than a grade schooler who learns about ax + b = 0 type equations.
For your particular problem, it can be hard to see how it fits in this format. As a whole, it doesn't: you can't present the ANN with a game and have it learn the moves, that is much too complex. You need to present it with something for which you have a correct numerical label associated with and you want the ANN to learn the underlying pattern.
To do this, you should break your problem up into subproblems. For example, input the current player's cards and expect as output the correct move.
The cards that have been played previously, with metadata of which player has played which card
The ANN should only care about the current player. I would not use metadata or any other information that identifies the players.
Giving it a history could get complicated. You might want recurrent neural networks for that.
The cards on the table for this turn, also with the same metadata
Yes, but again, I wouldn't use metadata.
The cards in the ANN's hand
Also good.
Make sure you have as many input units as the MAXIMUM number of cards you want to input (2 x total possible cards, for the cards in hand and those on the table). This will be a binary vector where the ith position is true if the card corresponding to that position exists in hand / on the table.
Then do the same for moves: you will have m binary output units, where the ith will be true if the ANN thinks you should do move i, where there are m possible moves in total (pick the max if m depends on stages in the game).
Your training data will also have to be in this format. For simplicity, let's say there can be at most 2 cards in hand and 2 on the table, out of a total of 5 cards, and we can choose from 2 moves (say fold and all in). Then a possible training instance is:
Xi = 1 0 0 1 0 0 0 0 1 1 (meaning cards 1 and 4 in hand, cards 4 and 5 on table)
yi = 0 1 (meaning you should go all in in this case)
I also don't really know when to teach it (after each turn or after each game), as it is beneficial to have all the penalty cards, but bad to have all but one penalty card.
You should gather a lot of labeled training data in the format I described, train it on that, and then use it. You will need thousands or even tens of thousands of games to see good performance. Teaching it after each turn or game is unlikely to do well.
This will lead to very large neural networks. Another thing that you might try is to predict who will win given a current game configuration. This will significantly reduce the number of output units, making learning easier. For example, given the cards currently on the table and in hand, what is the probability that the current player will win? With enough training data, neural networks can attempt to learn these probabilities.
There are obvious shortcomings: the need for large training data sets. There is no memory of how the game has gone so far (unless you use much more advanced nets).
For games such as these, I suggest you read about reinforcement learning, or dedicated algorithms for your particular game. You're not going to have much luck teaching an ANN to play chess for example, and I doubt you will teaching it to play a card game.
First of all you need to create some good learning data set for training ANN. If your budget allows you can ask some cards professionals to share with you enough of their matches of how they played cards. Another way of generating data could be some bots, which play cards. Then you need to think how to represent data set of playing matches to neural network. Also I recommend you to represent cards not by their value (0.2, 0.3, 0.4, ..., 0.10, 0.11 (for jack), but as separated input. Also look for elastic neural networks which can be used for such task.

Inter annotator agreement when users annotates more than one category for any subject

I want to find the inter annotator agreement for few annotators.
Annotators annotates few categories (out of 10 categories) for each subjects.
For e.g. there are 3 annotator , 10 categories and 100 subjects .
I am aware about http://en.wikipedia.org/wiki/Cohen's_kappa (For two annotators) and http://en.wikipedia.org/wiki/Fleiss%27_kappa (for more than two annotators) inter annotator agreement but I realized that they may not work if user annotates more than one category for any subject.
Do anyone has any idea for determining inter annotation agreement in this scenario.
Thanks
i had to do this several years back. i cant recall how exactly i did it(i dont have code anymore) but i have a worked example to report to my professor. i was dealing with annotation of comments and have 56 categories and 4 annotators.
note:at the time i need a way to detect where annotators most disagree so that after each annotation session they can focus on why they disagree and set out reasonable rules to maximize this statistic. it worked well for that purpose
Let's assume A-D are annotators and 1-5 are categories. This is a possible scenario.
A B C D Probability of agreement
1 X X X X 4/4
2 X X X 3/4
3 X X 2/4
4 X 1/4
5
A tags this comment as 1,2,3,4 B->1,2,3, and so forth.
For each category the probability of agreement is calculated.
Which is then divided by the number of unique categories tagged for that particular comment.
Therefore for the example comment, we have 10/16 as annotator's agreement. This is a value between 0 and 1.
if this doesnt work for you then (http://www.mitpressjournals.org/doi/pdf/10.1162/coli.07-034-R2) pg-567, which was referenced by pg-587 case study.
Compute agreement on a per-label basis. If you treat one of the annotators as the gold standard, you can then compute recall and precision on label assignments. Another option is label overlap, which would be the proportion of subjects where either annotator assigned a category where the both assigned it (intersection over union).

Can an artificial neural network predict the outcome of sports games? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I was trying to find something original and fun to do with artificial neural networks (ANNs) as a personal/learning project and I though it would be cool if I could predict the results of sports games (especially NHL games).
I'm pretty sure it would be easy to evolve an ANN that can predict which team is most likely to win (usually the team with the better record). However, what I would like to do is create an ANN that would tell how likely the outcome is, similar to bookmaker odds.
Is this something an ANN can do? In the affirmative, what kind of success can I expect? I know I can't beat the bookmaker (at least not with a software solution). I want do this as a recreational project/challenge to myself. I don't expect to bet money on sports games with this project.
Way back in the days of the IBM XT I played with a shareware ANN program to try and improve my chances on the British football (soccer) pools. This is a form of betting where you try and predict which football matches will result in draws. I assigned each team a number then looked back thorough past results and from them generated a single digit for the result. From memory it was 0 from a home win , 1 for an away win and 2 for a draw. Each result went on a single line in a training file. I would then run the training file through the program and generate the ANN settings. I would then look up the following Saturdays matches and feed them into the ANN then look for matches predicted as draws.
As the weeks went on my predictions of draws did definetly become more and more accurate. However ...
1) The XT was so slow that by Christmas it was taking 24 hours to generate the ANN settings from the training data. I really had better things to do with my precious (and expensive) PC.
2) Although it was better at predicting draws it wasn't predicting enough to actually win any money. Looking back I suppose the program had just worked out that Manchester United would always beat Sheffield United. This was more football knowledge than I had but not enough to win any money.
3) Entering the results into the training data and then generating the forthcoming matches data was taking me ages and to be honest sport bores me rigid.
So I gave up and didn't become a millionaire.
These days however PC's are much faster and much of the training data could be scraped from the web. But I still doubt it is a route to a fortune but its certainly an interesting project.
Ian
A reply above stated:
I know that if the bookmakers odds could be beaten by an ANN,
bookmakers would already be using one to fix their odds.
Bookmakers don't set the line based on their analysis of the teams - they set it based on their analysis of the betting public's opinion of the teams. An ideal line for the bookie is where he has exactly the same amount bet on each side of the line - then he is guaranteed a profit = the 'juice' on the losers' bets. They move the line as game approaches to try to keep that 50/50 split. Bookie may think Home team -5 is accurate line based on game analysis, but if he expects that will draw 2x $$ on the Home team he will not set the line at -5 - he will set at -7 or -8 - to where he expects to draw equal $$ for both -5 and +5 bets.
ANNs are really good at pattern matching and prediction, so yes, odds are you could build an ANN that does what you want.
You'll need more than just team win/loss ratio to make it really effective however. Feed it stats for the players, too. For real effectiveness, try to include game-flow information... like which players are on the line for each play (for football, for example).
Ultimately, the biggest problem you'll run into (aside from the whole "writing the ANN" issue) is getting the data you need to feed it.
I've done some stock market predictions with an AI and my conclusion is that it is not very hard to make an AI that gets good results with the historical data.
Making winning transactions in the future is a different ballgame.
I have just worked on this very problem (predicting English Premier League games) for the past 10 days, and ended up with very similar results using 3 different methods: SVM, Logistic Regression, and NN.
LR and NN will give probabilities. SVM outputs 0/1 (but it can be tweaked for probas too (I haven't tried yet).
I needed a "massive" (by my standards at least) feature set though (almost 300) and a good chunk of data (13 years worth).
Re. data, I got it from the web, simply.
Conclusion: I can just about match the bookies in terms of accuracy (predicting victories in my case). If I add the pre-match odds to the feature set, I get the exact same accuracy as the bookies (as expected), but no better (surely meaning my feature set is summarized in the bookies odds, and they have a little extra knowledge on top).
I'm sure there is a way to get better accuracy, either by improving the algos, or more likely by having extremely granular data (as in which players play which games, for how many minutes, and a lot of player-level historical stats, so as to build bottom-up models of team performance).
But bottom line is I can testify NNs work quite well for that purpose. SVM is slightly better though, in my limited experience.
I think it's indeed all about data, but there's no end to what you could feed it with in order to be more accurate : winning/loosing streaks, players biorhythms, player's girlfriends mood before the game, minor/major injuries they suffered in the recent past, extra-sportive events that are bothering the players, etc, etc, etc.
But I don't think you can accurately predict which team is more likely to win, it would be just a more-or-less educated guess.
In my opinion and experience, because of the excessively large number of factors in play, designing and training the ANN will be unreasonably complex and time-consuming. ANNs are good at pattern matching, and game prediction takes much deductive reasoning rather than mere pattern matching.
But if you want to enjoy learning neural networks, it will be a good adventure. If you are successful, you might want to host your code somewhere for others to see and learn!
For game prediction, it would be much easier and faster with decision trees or a rules engine and so on. This will be no easy task either, but it will be another interesting activity.
My belief is that the unpredictability of an event is due to lack of information and understanding...If you have all the knowledge, then yes it could be done. Or, the more knowledge you have, the better it can be done.
So in theory, the answer is yes.
However, in practice, you can get a PhD and have a whole career working on this question and you still may not succeed.

Simulating sports matches in online game

In an online manager game (like Hattrick), I want to simulate matches between two teams.
A team consists of 11 players. Every player has a strength value between 1 and 100. I take these strength values of the defensive players for each team and calculate the average. That's the defensive quality of a team. Then I take the strengths of the offensive players and I get the offensive quality.
For each attack, I do the following:
$offFactor = ($attackerTeam_offensive-$defenderTeam_defensive)/max($attackerTeam_offensive, $defenderTeam_defensive);
$defFactor = ($defenderTeam_defensive-$attackerTeam_offensive)/max($defenderTeam_defensive, $attackerTeam_offensive);
At the moment, I don't know why I divide it by the higher one of both values. But this formula should give you a factor for the quality of offense and defense which is needed later.
Then I have nested conditional statements for each event which could happen. E.g.: Does the attacking team get a scoring chance?
if ((mt_rand((-10+$offAdditionalFactor-$defAdditionalFactor), 10)/10)+$offFactor >= 0)
{ ... // the attack succeeds
These additional factors could be tactical values for example.
Do you think this is a good way of calculating a game? My users say that they aren't satisfied with the quality of the simulations. How can I improve them? Do you have different approaches which could give better results? Or do you think that my approach is good and I only need to adjust the values in the conditional statements and experiment a bit?
I hope you can help me. Thanks in advance!
Here is a way I would do it.
Offensive/Defensive Quality
First lets work out the average strength of the entire team:
Team.Strength = SUM(Players.Strength) / 11
Now we want to split out side in two, and work out the average for our defensive players, and our offensive players.]
Defense.Strength = SUM(Defensive_Players.Strength)/Defensive_Players.Count
Offense.Strength = SUM(Offense_Players.Strength)/Offense_Players.Count
Now, we have three values. The first, out Team average, is going to be used to calculate our odds of winning. The other two, are going to calculate our odds of defending and our odds of scoring.
A team with a high offensive average is going to have more chances, a team with a high defense is going to have more chance at saving.
Now if we have to teams, lets call them A and B.
Team A, have an average of 80, An offensive score of 85 and a defensive score of 60.
Team B, have an average of 70, An offensive score of 50 and a defensive score of 80.
Now, based on the average. Team A, should have a better chance at winning. But by how much?
Scoring and Saving
Lets work out how many times goals Team A should score:
A.Goals = (A.Offensive / B.Defensive) + RAND()
= (85/80) + 0.8;
= 1.666
I have assumed the random value adds anything between -1 and +1, although you can adjust this.
As we can see, the formula indicates team A should score 1.6 goals. we can either round this up/down. Or give team A 1, and calculate if the other one is allowed (random chance).
Now for Team B
B.Goals = (B.Offensive / A.Defensive) + RAND()
= (50/60) + 0.2;
= 1.03
So we have A scoring 1 and B scoring 1. But remember, we want to weight this in A's favour, because, overall, they are the better team.
So what is the chance A will win?
Chance A Will Win = (A.Average / B.Average)
= 80 / 70
= 1.14
So we can see the odds are 14% (.14) in favor of A winning the match. We can use this value to see if there is any change in the final score:
if Rand() <= 0.14 then Final Score = A 2 - 1 B Otherwise A 1 - 1 B
If our random number was 0.8, then the match is a draw.
Rounding Up and Further Thoughts
You will definitely want to play around with the values. Remember, game mechanics are very hard to get right. Talk to your players, ask them why they are dissatisfied. Are there teams always losing? Are the simulations always stagnant? etc.
The above outline is deeply affected by the randomness of the selection. You will want to normalise it so the chances of a team scoring an extra 5 goals is very very rare. But a little randomness is a great way to add some variety to the game.
There are ways to edit this method as well. For example instead of the number of goals, you could use the Goal figure as the number of scoring chances, and then have another function that worked out the number of goals based on other factors (i.e. choose a random striker, and use that players individual stats, and the goalies, to work out if there is a goal)
I hope this helps.
The most basic tactical decision in football is picking formation, which is a set of three numbers which assigns the 10 outfield players to defence, midfield and attack, respectively, e.g. 4/4/2.
If you use average player strength, you don't merely lose that tactic, you have it going backwards: the strongest defence is one with a single very good player, giving him any help will make it more likely the other team score. If you have one player with a rating of 10, the average is 10. Add another with rating 8, and the average drops (to 9). But assigning more people to defence should make it stronger, not weaker.
So first thing, you want to make everything be based on the total, not the average. The ratio between the totals is a good scale-independent way of determining which teams is stronger and by how much. Ratios tend to be better than differences, because they work in a predictable way with teams of any range of strengths. You can set up a combat results table that says how many goals are scored (per game, per half, per move, or whatever).
The next tactical choice is whether it is better to have one exceptional player, or several good ones. You can make that matter that by setting up scenarios that represent things that happen in game, e.g. a 1 on 1, a corner, or a long ball. The players involved in a scenario are first randomly chosen, then the result of the scenario is rolled for. One result can be that another scenario starts (midfield pass leads to cross leads to header chance).
The final step, which would bring you pretty much up to the level of actual football manager games, is to give players more than one type of strength rating, e.g., heading, passing, shooting, and so on. Then you use the strength rating appropriate to the scenario they are in.
The division in your example is probably a bad idea, because it changes the scale of the output variable depending on which side is better. Generally when comparing two quantities you either want interval data (subtract one from the other) or ratio data (divide one by the other) but not both.
A better approach in this case would be to simply divide the offensive score by the defensive score. If both are equal, the result will be 1. If the attacker is better than the defender, it will be greater than 1, and if the defender is stronger, it will be less than one. These are easy numbers to work with.
Also, instead of averaging the whole team, average parts of the team depending on the formations or tactics used. This will allow teams to choose to play offensively or defensively and see the pros and cons of this.
And write yourself some better random number generation functions. One that returns floating point values between -1 and 1 and one that works from 0 to 1, for starters. Use these in your calculations and you can avoid all those confusing 10s everywhere!
You might also want to ask the users what about the simulation they don't like. It's possible that, rather than seeing the final outcome of the game, they want to know how many times their team had an opportunity to attack but the defense regained control. So instead of
"Your team wins 2-1"
They want to see match highlights:
"Your team wins 2-1:
- scored at minute 15,
- other team took control and went for tried for a goal at minute 30,
but the shoot was intercepted,
- we took control again and $PLAYER1 scored a beautiful goal!
... etc
You can use something like what Jamie suggests for a starting point, choose the times at random, and maybe pick who scored the goal based on a weighted sampling of the offensive players (i.e. a player with a higher score gets a higher chance of being the one who scored). You can have fun and add random low-probability events like a red card on a player, someone injuring themselves, streakers across the field...
The average should be the number of players... using the max means if you have 3 player teams:
[4 4 4]
[7 4 1]
The second one would be considered weaker. Is that what you want? I think you would rather do something like:
(Total Scores / Total Players) + (Max Score / Total Players), so in the above example it would make the second team slightly better.
I guess it depends on how you feel the teams should be balanced.