How is the Conway's Game of Life implemented in Conway's Game of Life? - conways-game-of-life

I just came across http://www.youtube.com/watch?feature=player_embedded&v=xP5-iIeKXE8 , which is an implementation of Conway's Game of Life in ... Conway's Game of Life.
I think it is possible to do this in theory because the Game of Life is Turing complete, but how is it implemented in this case?

jwz has a recent blog entry that talks about this construction: Turtles, all the way down. Or gliders. Or glider turtles. This quote about the "Outer Totalistic Cellular Automata Meta-Pixel" pretty much says it all, in terms that I certainly don't understand:
The rule is encoded in two columns, each of nine eaters, where one column corresponds to the 'Birth' rule and the other corresponds to 'Survival'. The nine eaters correspond to the nine different quantities of on cells (0 through 8). The presence or absence of the eater indicates whether the cell should be on in the next meta-generation. The state of the eater is read by the collision of two antiparallel LWSSes, which radiates two antiparallel gliders (not unlike an electron-positron reaction in a PET scanner). These gliders then collide into beehives, which are restored by a passing LWSS in Brice's elegant honeybit reaction. If the eater is present, the beehive would remain in its original state, thereby allowing the LWSS to pass unaffected; if the eater is absent, the beehive would be restored, consuming the LWSS in the process. Equivalently, the state of the eater is mapped onto the state of the LWSS.

Related

AlphaGo Zero board evaluation function uses multiple time steps as an input... Why?

According to AlphaGo Cheat Sheet, AlphaGo Zero uses a sequence of consecutive board configurations to encode its game state.
In theory, all the necessary information is contained in the latest state, and yet they include the previous 7 configurations.
Why did they choose to inject so much complexity ?
What are they listening for ??
AlphaGoZero
The sole reason is because in all games - Go, Chess, and Shogi - there is a repetition rule. What this means is that the game is not fully observable from the current board position. In other words, there may be two identical positions with two very different evaluations. For example in one Go position there may be a winning move, but in an identical Go position that move is either illegal or one of the next few moves in the would-be-winning continuation creates an illegal position.
You could try feeding in only the current board position and handling repetitions in the tree only. But I think this would be weaker because the evaluation function would be wrong in some cases, leading to a horizon effect if that branch of the tree had not been explored deeply enough to correct the problem.

Programming an intelligent game-playing bot

I am taking part in a programming competition where the objective is writing a bot that can play a specific game.
The objective of the game is to earn a certain amount of points. You control multiple airships, that you move around, capture islands and navigate drones that carry treasure. You play against one opponent, turns happen simultaneously, and there is a time limit. You can move multiple ships and drones in one turn. You can program your bot in Python, Java or C#.
The exact details don‘t matter, just that each ship has around 15 options each turn (moving and shooting) and overall you have around 10000 different options for each turn (different configurations of airship movements and shooting)
Up until now I approached this competition naively, and haven‘t done anything exceptionally clever (for example, if near enemy, shoot). I have read about minimax algorithms, and I would really like to apply it here (or something similar), you can assume that I can tell the value of a state. My problem is the mass of options for each turn - which create an enourmous branching factor that doesnt let me get very deep.
Question 1: Is there a better, applicable approach to this problem? Perhaps deep-learning or something similar?
Question 2: Is there a way to minimize the branching factor? I`ve read about alpha-beta and similar algorithms, but nothing seems to do the job.
Any help would be much appreciated
The minimax algorithm seems to be natural for these kinds of problems. At first, the game will be modelled in a abstract way and then a solver is used to find the path from current situation to a gamestate which maximize the amount of points. A similar approach to minimax is GOAP, which was implemented in the 1970'er for Shakey the robot under the name STRIPS. But, GOAP and minimax has two problems: first, a abstract model of the game is needed (perhaps in PDDL or in Game Description Language) and second the state-space is to big.
An better alternative to planning is to use a Behavior Tree. Thats a static program which describes the behavior of an agent. No solver is needed and no complete modelling of the game is needed. Instead, a bottom up approach is used with multiple edit-compile-run iterations for finding the optimal behavior tree (Test-driven-development). To implement such programming approach a so called "reactive planner" has to be implemented first which is another word for a realtime scheduler. Thats a module whichs maps a behavior tree onto a gantt-chart for executing an action at a specific moment in time. As introduction, the unity3d Engine is a good starting point, which has a full behaviortree implementation out-of-the-box.

Implementing reinforcement learning in NetLogo (Learning in multi-agent models)

I am thinking to implement a learning strategy for different types of agents in my model. To be honest, I still do not know what kind of questions should I ask first or where to start.
I have two types of agents which I want them to learn by experience, they have a pool of actions which each has different reward based on specific situations that might happen.
I am new to reinforcement Learning methods, therefore any suggestions on what kind of questions should I ask myself is welcomed :)
Here is how I am going forward to formulate my problem:
Agents have a lifetime and they keep track of a few things that matter for them and these indicators are different for different agents, for example, one agent wants to increase A another wants B more than A.
States are points in an agent's lifetime which they
Have more than one option (I do not have a clear definition for
States as they might happen a few times or not happen at all because
Agents move around and they might never face a situation)
The reward is the an increase or decrease in an indicator that agents can get from an action in a specific State, and agent do not know what would be the gain if he chose another action.
The gain is not constant, the states are not well defined and there is no formal transition of one state into another,
For example agent can decide to share with one of the co-located agent (Action 1) or with all of the agents at the same location(Action 2) If certain conditions hold true Action A will be more rewarding for that agent, while in other conditions Action 2 will have higher reward; my problem is I did not see any example with unknown rewards since sharing in this scenario also depends on the other agent's characteristics (which affects the conditions of reward system) and in different states it will be different.
In my model there is no relationship between the action and the following state,and that makes me wonder if its ok to think about RL in this situation at all.
What I am looking to optimize here is the ability for my agents to reason about current situation in a better way and not only respond to their need which is triggered by their internal states. They have a few personalities which can define their long term goal and can affect their decision making in different situations, but I want them to remember what action in a situation helped them to increase their preferred long term goal.
In my model there is no relationship between the action and the following state,and that makes me wonder if its ok to think about RL in this situation at all.
This seems strange. What do actions do if not change state? Note that agents don't have to necessarily know how their actions will change their state. Similarly, actions could change the state imperfectly (a robots treads could skid out so it doesn't actually move when it tries to). In fact, some algorithms are specifically designed for this uncertainty.
In any case, even if the agents are moving around the state space without having any control, it can still learn the rewards for the different states. Indeed, many RL algorithms involve moving around the state space semi-randomly to figure out what the rewards are.
I do not have a clear definition for States as they might happen a few times or not happen at all because Agents move around and they might never face a situation
You might consider expanding what goes into what you consider to be a "state". For instance, the position seems like it should definitely go into the variables identifying a state. Not all states need to have rewards (although good RL algorithms typically infer a measure of goodness of neutral states).
I would recommend clearly defining the variables that determine an agent's state. For instance, the state space could be current-patch X internal-variable-value X other-agents-present. In the simplest case, the agent can observe all of the variables that make up their state. However, there are algorithms that don't require this. An agent should always be in a state, even if the state has no reward value.
Now, concerning unknown reward. That's actually totally okay. Reward can be a random variable. In that case, a simple way to apply standard RL algorithms would be to use the expected value of the variable when making decisions. If the distribution is unknown, then the algorithm could just use the mean of the rewards observed so far.
Alternatively, you could include the variables that determine the reward in the definition of the state. That way, if the reward changes, then it is literally in a different state. For example, suppose a robot is on top of a building. It needs to get to the top of the building in front of it. If it just moves forward, it falls to ground. Thus, that state has a very low reward. However, if it first places a plank that goes from one building to the other, and then moves forward, the reward changes. To represent this, we could include plank-in-place as a variable so that putting the board in place actually changes the robot's current state and the state that would result from moving forward. Thus, the reward itself has not changed; it's just in a different state.
Hopefully this helps!
UPDATE 2/7/2018: A recent upvote reminded me of the existence of this question. In the years since it was asked, I've actually dived into RL in NetLogo to a much greater extent. In particular, I've made a python extension for NetLogo, primarily to make it easier to integrate machine learning algorithms in with model. One of the demos of the extension trains a collection of agents using deep Q-learning as the model runs.

In FSMs does one State last one clock cycle or more?

Need to design a simple one for school.
More specifically a Moore FSM. Im not sure how state transitions happen, is it next state each clock?
I need to know because im wondering if i can shift a register and add a value to it, all in the same state... Could use wave edges?
EDIT:
I have to design the ALU part with registers as a schematic from gate-level, so no target CPU.
I made the algorith diagram, then put states to function blocks according Moore FSM rules. each block of operations gets one state.
For instance in a state S1, i have the following operations: y0 = shift Reg1 left; y1 = Reg1 = Reg1 + Reg2. So the microcommand that the control part of Moore FSM outputs would be 0000011 (yn...y1,y0). this microcommand should be the input to the ALU part which i need to design. Now i realized y1,y0 will conflict eachother, since both are using Reg1.
Its problematic since I dont actually have the Control part, I have to imagine the core FSM and design only the ALU with registers. This is why i was wondering if i get more than one clock cycle, so i can sequence y0,y1 or do i have to complete the entire operation in one clock?
I plan on making parallel-in, parallel-out non-shift registers, obviously i cant do the two operations of the microcommand at the same time. So what can i do:
1. make extra states? which i really dont want to do
2.use edges of a single clock? (might cause problems?)
3.Assume i get a preset amount of ticks from the clock to complete the microcommand ?
This would make the most sense, but i dont know if its the case.
The control unit does "know" the algorithm and thus how many operations need to be performed
I have to note again, that the control part is totally abstract and i have no idea how this is handled in practice.
A FSM itself has no inherent notion of time (although it can be defined). A Moore machine is simplified model and lacks the ability to even formally represent an ever progressing "time" (without, of course, implementing the counting entirely with states); remember, there is only a finite set of states.
In any case, time can be introduced in an implementation detail of a particular FSM and the amount of time might required to change between particular states might not be constant. (A particular FSM might also map differently to different implementations.) In the case of a clocked system it would require looking into how each "clock" is defined in the implementation; it might be leading edge, trailing edge, both, or something entirely different.
Instead of looking at the FSM here for guidance (it is just the logical progression of states), look at the opcodes (or whatever the implementation is) that the FSM represents, and how the CPU (or whatever the implementation is) in question "executes" them.
(What do the books say? ;-)

real-time in context of a game

I have a problem grokking the concept of real-time (IMO badly named, different meaning in different contexts). I understand real-time software as a software where time is a key variable. Events must occur at given time. Say, railway switch change at 15:02 and the next one must be at 15:05 no matter what.
But how about this example. In game, when player's FPS drops below 16 game exits and tell user to upgrade his hardware or kill other applications. So when one iteration of the game loop takes more than 1/16 of a second the output of the program is completely different.
Is it real-time(ish)? Can it be considered as a Real Time Computing?
Your question is hard to understand, are you referring to Real Time Computing, or simulating real time, or something completely different?
Simulating real time: It is possible to simulate real-time in a game by polling for events. Store the time of an event, and then when it comes time to render a frame, the game should repeatedly 'fast forward' by moving the current time to the time of the next event and handle the event. This should repeat until there are no more events, or the time is 'current'.
This requires you to have anything that is a function of time (such as velocity, position, acceleration) be calculated according to the current time. This means you would not have these attributes periodically updated, and allows your game to be deterministic, as the 'game time' is no longer dependent upon real time. It also makes things like game speed and pausing very simple to implement.
If you're referring to the concept of real-time systems, then I would say there's not enough information to determine whether that 'game loop' is 'real-time'. It depends on the operating environment of the game, and the logic in the 'game loop'. According to wikipedia, a real-time deadline must be met, regardless of system load.
In the rapidly approaching canonical article Fix your Timestep!, Glenn Fielder addresses numerous ways to handle this issue. While the article focuses primarily on physics, the key points are applicable to any system that represents a function of time, to wit, things dealing with moving things.
The executive summary of that article (which is well worth reading) is this:
You can make your physics deterministic (well, as much as can be achieved with imperfect input) by using discrete physics timesteps. It looks like this:
Render as fast as possible
Pass in a time delta that represents how long steps previous took this frame
Process delta time modulo timestep number of physics steps
Store the remainder of delta that you weren't able to process in an accumulator
That accumulator gets added to the next frame's time buffer. This requires some fine tuning such that temporary lag spikes due to e.g. a rapidly spinning player (which necessitates a lot of visibility determination over time) don't end up putting you in an inescapable time debt. If you wanted to intelligently guard against such an occurrence, you could have a sentry look for dangerous levels of accumulated time, which you could respond to by perhaps dropping a video frame.
Another advantage to using discrete timesteps is that they behave well in multiplayer games. If you have an authoritative server or node in a peer-to-peer configuration, the server can ensure that all clients' physics simulations are running at the same physics timeline. Discrete time blocks also simplifies things in rollback based multiplayer.
Edit:
Disclaimer: I've never written software for real-time myself, only worked in a company that had!
In response to really-real real-life Real Time software, it's unlikely that anyone has made a game that could be qualified as this, at least in software. (I'm not sure how one would qualify games on ROMs or games that don't run under a host OS?) While your example would be an attempt at real-time software, most real-time software goes through a period of certification in which the maximum amount of time spent per instruction or on a logical block of operation is determined. Games might come close to this in a sense when, for example, platform licensors have requirements (as I believe XBLA does) regarding minimum 30fps or similar. However, these certifications are usually established through a period of testing rather than through mathematical proof.