Bayesian network and fuzzy logic - matlab

Can anyone give me an example of a Bayesian network and fuzzy logic being used in intrusion detection?
I'm struggling to figure out how it can be used. And any code on it?
Thanks guys.

The exact details will depend upon whether you're talking about a burglar alarm type situation (sensor readings) or something fancier involving security guards and sharks with lasers. Either way, the principle is the same.
You start with root nodes describing the basic things that affect intrusion, e.g.,
Sensor detected motion (true/false)
Shark smelt blood (true/false)
Temperature (too low/just right/too high)
Security guard is asleep
...
any other things you can think of.
You assign a probability to each state of each root node.
P(Security guard is asleep) = 0.25
Then you define child nodes that depend upon those root nodes, e.g., Security guard heard noise would depend upon Security guard is asleep.
You assign conditional probabilities for each state of the child nodes, given each state of its parents.
P(Security guard heard noise|Security guard is asleep) = 0.05
P(Security guard heard noise|Security guard is not asleep) = 0.5
Eventually, you'll want to get to an outcome like Burglary has been foiled.
Once you have your network node set up, you can evaluate it, and calculate the probability of different outcomes happening.
Next you add evidence. So if you know your shark smelt blood, that node gets set to a particular value and you can reevaluate the network to see how probabilities have changed.
In terms of software, the Bayes Net toolbox is well regarded.

Related

Observation Space for race strategy development - Reinforcement learning

I refrained from asking for help until now, but as my thesis' deadline creeps ever closer and I do not know anybody with experience in RL, I'm trying my luck here.
TLDR;
I have not found an academic/online resource which helps me understand the correct representation of the environment as an observation space. I would be very thankful for any links or for giving me a starting point of how to model the specifics of my environment in an observation space.
Short thematic introduction
The goal of my research is to determine the viability of RL for strategy development in motorsports. This is currently achieved by simulating (lots of!) races and calculating the resulting race time (thus end-position) of different strategic decisions (which are the timing of pit stops + amount of laps to refuel for). This demands a manual input of expected inlaps (the lap a pit stop occurs) for all participants, which implicitly limits the possible strategies by human imagination as well as the amount of possible simulations.
Use of RL
A trained RL agent could decide on its own when to perform a pit stop and how much fuel should be added, in order to minizime the race time and react to probabilistic events in the simulation.
The action space is discrete(4) and represents the options to continue, pit and refuel for 2,4,6 laps respectively.
Problem
The observation space is of POMDP nature and needs to model the agent's current race position (which I hope is enough?). How would I implement the observation space accordingly?
The training is performed using OpenAI's Gym framework, but a general explanation/link to article/publication would also be appreciated very much!
Your observation could be just an integer which represents round or position the agent is in. This is obviously not a sufficient representation so you need to add more information.
A better observation could be the agents race position x1, the round the agent is in x2 and the current fuel in the tank x3. All three of these can be represented by a real number. Then you can create your observation by concating these to a vector obs = [x1, x2, x3].

How is the bias node integrated in NEAT?

In NEAT you can add a special bias input node that is always active. Regarding the implementation of such a node there is not much information in the original paper. Now I want to know how the bias node should behave, if there is a at all a consensus.
So the question is:
Do connections from the bias node come about during evolution and can be split for new nodes just like regular connections or does the bias node always have connections to all non-input nodes?
To answer my own question: According to the NEAT users page Kenneth O. Stanley talks about why the bias in NEAT is used as an extra input neuron:
Why does NEAT use a bias node instead of having a bias parameter in each node?
Mainly because not all nodes need a bias. Thus, it would unnecessarily enlarge the search space to be searching for a proper bias for every node in the system. Instead, we let evolution decide which nodes need biases by connecting the bias node to those nodes. This issue is not a major concern; it could work either way. You can easily code a bias into every node and try that as well.
My best guess is therefore that the BIAS input is treated like any other input in NEAT, with the difference that it is always active.

How should one set up the immediate reward in a RL program?

I want my RL agent to reach the goal as quickly as possible and at the same time to minimize the number of times it uses a specific resource T (which sometimes though is necessary).
I thought of setting up the immediate rewards as -1 per step, an additional -1 if the agent uses T and 0 if it reaches the goal.
But the additional -1 is completely arbitrary, how do I decide how much punishment should the agent get for using T?
You should use a reward function which mimics your own values. If the resource is expensive (valuable to you), then the punishment for consuming it should be harsh. The same thing goes for time (which is also a resource if you think about it).
If the ratio between the two punishments (the one for time consumption and the one for resource consumption) is in accordance to how you value these resources, then the agent will act precisely in your interest. If you get it wrong (because maybe you don't know the precise cost of the resource nor the precise cost of slow learning), then it will strive for a pseudo optimal solution rather than an optimal one, which in a lot of cases is okay.

How to make simulated electric components behave nicely?

I'm making a simple electric circuit simulator. It will (at least initially) only feature batteries, wires and resistors in series and parallel. However, I'm at a loss how best to simulate said circuit in a good way.
Specifically, I will have batteries and resistors with two contact points each, and wires that go between two contact points. I assume that each component will have a field for its resistance, the current through it and the voltage across it (current and voltage will, of course, be signed). Each component is given a resistance, and the batteries are given a voltage. The goal of the simulation is to assign correct values to all the other fields in real time as the player connects and disconnects components and wires.
These are the requirements:
It must be correct, including Ohm's and Kirchhoff's laws (I'm modeling real world circuits, and there is little point if the model does something completely different)
It must be numerically stable (we can't have uncontrolled oscillations or something just because two neighbouring resistors can't make up their minds together)
It should stabilize relatively quickly for, let's say, fewer than 30 components (having to wait a few seconds before the values are correct doesn't really satisfy "real time", but I really don't plan on using it for more than 10 or maybe 20 components)
The optimal formulation for me (how I envision this in my head) would be if I could assign a script to each component that took care of that component only, possibly by communicating field values with neighbouring components, and each component script works in parallel and adjusts as is needed
I only see problems here and no solutions. The biggest problem, I think, is Kirchhoff's voltage law (going around any sub-circuit, the voltage across all components, including signs, add up to 0), because that's a global law (it says somehting about a whole circuit and not just a single component / connection point). There is a mathematical reformulation saying that there exists a potential function on the points in the circuit (for instance, the voltage measured against the + pole of the battery), which is a bit more local, but I still don't see how to let a component know how much the voltage / potential drops across it.
Kirchhoff's current law (the net current flow into an intersection is 0) might also be trouble. It seems to force me to make intersections into separate objects to enforce it. I originally thought that I could just let each component have two lists (a left list and a right list) containing every other component that is connected to it at that point, but that might not make KCL easily enforcable.
I know there are circuit simulators out there, and they must have solved this exact problem somehow. I just can't find an explanation because if I try googling it, I only find the already made simulators and no explanations anywhere.

Implementing reinforcement learning in NetLogo (Learning in multi-agent models)

I am thinking to implement a learning strategy for different types of agents in my model. To be honest, I still do not know what kind of questions should I ask first or where to start.
I have two types of agents which I want them to learn by experience, they have a pool of actions which each has different reward based on specific situations that might happen.
I am new to reinforcement Learning methods, therefore any suggestions on what kind of questions should I ask myself is welcomed :)
Here is how I am going forward to formulate my problem:
Agents have a lifetime and they keep track of a few things that matter for them and these indicators are different for different agents, for example, one agent wants to increase A another wants B more than A.
States are points in an agent's lifetime which they
Have more than one option (I do not have a clear definition for
States as they might happen a few times or not happen at all because
Agents move around and they might never face a situation)
The reward is the an increase or decrease in an indicator that agents can get from an action in a specific State, and agent do not know what would be the gain if he chose another action.
The gain is not constant, the states are not well defined and there is no formal transition of one state into another,
For example agent can decide to share with one of the co-located agent (Action 1) or with all of the agents at the same location(Action 2) If certain conditions hold true Action A will be more rewarding for that agent, while in other conditions Action 2 will have higher reward; my problem is I did not see any example with unknown rewards since sharing in this scenario also depends on the other agent's characteristics (which affects the conditions of reward system) and in different states it will be different.
In my model there is no relationship between the action and the following state,and that makes me wonder if its ok to think about RL in this situation at all.
What I am looking to optimize here is the ability for my agents to reason about current situation in a better way and not only respond to their need which is triggered by their internal states. They have a few personalities which can define their long term goal and can affect their decision making in different situations, but I want them to remember what action in a situation helped them to increase their preferred long term goal.
In my model there is no relationship between the action and the following state,and that makes me wonder if its ok to think about RL in this situation at all.
This seems strange. What do actions do if not change state? Note that agents don't have to necessarily know how their actions will change their state. Similarly, actions could change the state imperfectly (a robots treads could skid out so it doesn't actually move when it tries to). In fact, some algorithms are specifically designed for this uncertainty.
In any case, even if the agents are moving around the state space without having any control, it can still learn the rewards for the different states. Indeed, many RL algorithms involve moving around the state space semi-randomly to figure out what the rewards are.
I do not have a clear definition for States as they might happen a few times or not happen at all because Agents move around and they might never face a situation
You might consider expanding what goes into what you consider to be a "state". For instance, the position seems like it should definitely go into the variables identifying a state. Not all states need to have rewards (although good RL algorithms typically infer a measure of goodness of neutral states).
I would recommend clearly defining the variables that determine an agent's state. For instance, the state space could be current-patch X internal-variable-value X other-agents-present. In the simplest case, the agent can observe all of the variables that make up their state. However, there are algorithms that don't require this. An agent should always be in a state, even if the state has no reward value.
Now, concerning unknown reward. That's actually totally okay. Reward can be a random variable. In that case, a simple way to apply standard RL algorithms would be to use the expected value of the variable when making decisions. If the distribution is unknown, then the algorithm could just use the mean of the rewards observed so far.
Alternatively, you could include the variables that determine the reward in the definition of the state. That way, if the reward changes, then it is literally in a different state. For example, suppose a robot is on top of a building. It needs to get to the top of the building in front of it. If it just moves forward, it falls to ground. Thus, that state has a very low reward. However, if it first places a plank that goes from one building to the other, and then moves forward, the reward changes. To represent this, we could include plank-in-place as a variable so that putting the board in place actually changes the robot's current state and the state that would result from moving forward. Thus, the reward itself has not changed; it's just in a different state.
Hopefully this helps!
UPDATE 2/7/2018: A recent upvote reminded me of the existence of this question. In the years since it was asked, I've actually dived into RL in NetLogo to a much greater extent. In particular, I've made a python extension for NetLogo, primarily to make it easier to integrate machine learning algorithms in with model. One of the demos of the extension trains a collection of agents using deep Q-learning as the model runs.