Feasibility of Machine Learning techniques for Network Intrusion Detection

Feasibility of Machine Learning techniques for Network Intrusion Detection - matlab

Is there a machine learning concept (algorithm or multi-classifier system) that can detect the variance of network attacks(or try to).
One of the biggest problems for signature based intrusion detection systems is the inability to detect new or variant attacks.
Reading up, anomaly detection seems to still be a statistical based en-devour it refers to detecting patterns in a given data set which isn't the same as detecting variation in packet payloads. Anomaly based NIDS monitors network traffic and compares it against an established baseline of a normal traffic profile. The baseline characterizes what is "normal" for the network - such as the normal bandwidth usage, the common protocols used, correct combinations of ports numbers and devices etc
Say some one uses Virus A to propagate through a network then some one writes a rule to stop Virus A but another person writes a "variation" of Virus A called Virus B purely for the purposes of evading that initial rule but still using most if not all of the same tactics/code. Is there not a way to detect variance?
If there is whats the umbrella term it would come under, as ive been under the illusion that anomaly detection was it.
Could machine learning be used for pattern recognition(rather than pattern matching) at the packet payload level?

i think your intution to look at machine learning techniques is correct, or will turn out to be correct (One of the biggest problems for signature based intrusion detection systems is the inability to detect new or variant attacks.) The superior performance of ML techiques is in general due to the ability of these algorithms to generalize (a multiplicity of soft constraints rather than a few hard constraints). and to adapt (updates based on new training instances to frustrate simple countermeasures)--two attributes that i would imagine are crucial for identifying network attacks.
The theoretical promise aside, there are practical difficulties with applying ML techniques to problems like the one recited in the OP. By far the most significant is the difficultly in gathering data to train the classifier. In particular, reliably labeling data points as "intrusion" is probably not easy; likewise, my guess is that these instances are sparsely distributed in the raw data."
I suppose it's this limitation that has led to the increased interest (as evidenced at least by the published literature) in applying unsupervised ML techniques to problems like network intrusion detection.
Unsupervised techniques differ from supervised techniques in that the data is fed to the algorithms without a response variable (i.e., without the class labels). In these cases you are relying on the algorithm to discern structure in the data--i.e., some inherent ordering in the data into reasonably stable groups or clusters (possibly what you the OP had in mind by "variance." So with an unsupervised technique, there is no need to explicitly show the algorithm instances of each class, nor is it necessary to establish baseline measurements, etc.
The most frequently used unsupervised ML technique applied to problems of this type is probably the Kohonen Map (also sometimes called self-organizing map or SOM.)
i use Kohonen Maps frequently, but so far not for this purpose. There are however, numerous published reports of their successful application in your domain of interest, e.g.,
Dynamic Intrusion Detection Using Self-Organizing Maps
Multiple Self-Organizing Maps for Intrusion Detection
I know MATLAB has at least one available implementation of Kohonen Map--the SOM Toolbox. The homepage for this Toolbox also contains a brief introduction to Kohonen Maps.

Related

What is the Difference between evolutionary computing and classification?

I am looking for some comprehensive description. I couldn't find it via browsing as things are more clustered on the web and its not in my scope currently.

Classification and evolutionary computing is comparing oranges to apples. Let me explain:
Classification is a type of problem, where the goal is to determine a label given some input. (Typical example, given pixel values, determine image label).
Evolutionary computing is a family of algorithms to solve different types of problems. They work with a "population" of candidates (imagine a set of different neural networks trying to solve a given problem). Somehow you evaluate how good each candidate is in the given task (typically using a "fitness function", but there are other methods). Then a new generation of candidates is produced, taking the best candidates from the previous generation as a model, and including mutations and cross-over (that is, introducing changes). Repeat until happy.
Evolutionary computing can absolutely be used for classification! But there are examples where it is used in different ways. You may use evolutionary computing to create an artificial neural network controlling a robot (in this case, inputs are sensor values, outputs are commands for actuators). Or to create original content free of a given goal, as in Picbreeder.
Classification may be solved using evolutionary computation (maybe this is why you where confused in the first place) but other techniques are also common. You can use decision trees, or notably deep-learning (based on backpropagation).
Deep-learning based on backpropagation may sound similar to evolutionary computation, but it is quite different. Here you have only one artificial neural network, and a clear rule (backpropagation) telling you which changes to introduce every iteration.
Hope this helps to complement other answers!

Classification algorithms and evolutionary computing are different approaches. However, they are related in some ways.
Classification algorithms aim to identify the class label of new instances. They are trained with some labeled instances. For example, recognition of digits is a classification algorithm.
Evolutionary algorithms are used to find out the minimum or maximum solution of an optimization problem. They randomly explore the solution space of the given problem. They can find a good solution in a reasonable time and are not able to find the global optimum in all problems.
In some classification approaches, evolutionary algorithms are used to find out the optimal value of the parameters.

Use a trained neural network to imitate its training data

I'm in the overtures of designing a prose imitation system. It will read a bunch of prose, then mimic it. It's mostly for fun so the mimicking prose doesn't need to make too much sense, but I'd like to make it as good as I can, with a minimal amount of effort.
My first idea is to use my example prose to train a classifying feed-forward neural network, which classifies its input as either part of the training data or not part. Then I'd like to somehow invert the neural network, finding new random inputs that also get classified by the trained network as being part of the training data. The obvious and stupid way of doing this is to randomly generate word lists and only output the ones that get classified above a certain threshold, but I think there is a better way, using the network itself to limit the search to certain regions of the input space. For example, maybe you could start with a random vector and do gradient descent optimisation to find a local maximum around the random starting point. Is there a word for this kind of imitation process? What are some of the known methods?

How about Generative Adversarial Networks (GAN, Goodfellow 2014) and their more advanced siblings like Deep Convolutional Generative Adversarial Networks? There are plenty of proper research articles out there, and also more gentle introductions like this one on DCGAN and this on GAN. To quote the latter:
GANs are an interesting idea that were first introduced in 2014 by a
group of researchers at the University of Montreal lead by Ian
Goodfellow (now at OpenAI). The main idea behind a GAN is to have two
competing neural network models. One takes noise as input and
generates samples (and so is called the generator). The other model
(called the discriminator) receives samples from both the generator
and the training data, and has to be able to distinguish between the
two sources. These two networks play a continuous game, where the
generator is learning to produce more and more realistic samples, and
the discriminator is learning to get better and better at
distinguishing generated data from real data. These two networks are
trained simultaneously, and the hope is that the competition will
drive the generated samples to be indistinguishable from real data.
(DC)GAN should fit your task quite well.

How to find bridges (community connecting nodes) in large networks represented using the adjacency matrix

I have networks of roughly 10K to 100K nodes which are all connected. These nodes are typically grouped into clusters of communities which are strongly connected with many edges between them and there are hubs etc. Between the communities there are nodes with a few edges bridging / connecting the communities together. These datasets are in adjacency matrices
I have tried spectral clustering (Ding et al 2001) but it is really slow on large data sets and seems to stop working when there is a lot of ambiguity (bridges which are not the only bridge route to another cluster- other communities can act as alternative proxy routes).
I have tried some of the methods from martelot such as the Newman algorithm for modularity optimisation but have not incorporated the stability optimisation functions in that effort (could that be crucial?). On synthetic data sets where the clusters are created by random graphs (ER graphs) the methods work but on real ones where there is nested hierarchy the results are scattered. Using a standalone visualization application/tool the bridges are evident though.
What methods would you recommend/advise to try? I am using MATLAB.

What do you want to do, exactly? Detect communities, or bridges between them? Those are two different problems. Once you have the communities, it's straightforward enough identifying the edges connecting nodes from two distinct communities. So, I guess you want to detect communities.
There are actually thousands methods for this purpose, some of them implemented in Matlab, such as the one you cite, or the generalized Louvain algorithm (also based on modularity optimization). However, most of them are rather available as C or C++ programs, such as InfoMap (based on a data compression paradigm), WalkTrap (clustering using a random walk-based distance), Markov Cluster (simulates some propagation mechanism), and the list goes on...
Those tools formalize the notion of community structure more or less differently, potentially leading to different (estimated) community structures, when applied on the same network. And of course, different communities means different bridges, too. So the question is rather to know how to pick the appropriate method for your data. You seem to have a priori knowledge regarding the networks you are studying, so you should use that to make your choice (rather than the programming language). For instance, even if you don't state it explicitly, you seem to be looking for a hierarchical community structure: not all tools are able to detect this kind of structure. Similarly, if you think one node can belong to several communities at the same time, then you should consider looking for overlapping communities, for instance using CFinder (based on clique percolation).
I'd advise you to have a look at this excellent review of community detection, you might find some interesting information allowing you to pick a method: Community Detection in Graphs. Also, from a programming point of view, I'd advise you to play with the igraph library (available for C, R and Python): it contains several standard community detection tools. You can try them on your data and see what you get.

Artificial Neural Network that creates it's own connections

I've been reading about feed forward Artificial Neural Networks (ANN), and normally they need training to modify their weights in order to achieve the desired output. They will also always produce the same output when receiving the same input once tuned (biological networks don't necessarily).
Then I started reading about evolving neural networks. However, the evolution usually involves recombining two parents genomes into a new genome, there is no "learning" but really recombining and verifying through a fitness test.
I was thinking, the human brain manages it's own connections. It creates connections, strengthens some, and weakens others.
Is there a neural network topology that allows for this? Where the neural network, once having a bad reaction, either adjusts it's weights accordingly, and possibly creates random new connections (I'm not sure how the brain creates new connections, but even if I didn't, a random mutation chance of creating a new connection could alleviate this). A good reaction would strengthen those connections.
I believe this type of topology is known as a Turing Type B Neural Network, but I haven't seen any coded examples or papers on it.

This paper, An Adaptive Spiking Neural Network with Hebbian Learning, specifically addresses the creation of new neurons and synapses. From the introduction:
Traditional rate-based neural networks and the newer spiking neural networks have been shown to be very effective for some tasks, but they have problems with long term learning and "catastrophic forgetting." Once a network is trained to perform some task, it is difficult to adapt it to new applications. To do this properly, one can mimic processes that occur in the human brain: neurogenesis and synaptogenesis, or the birth and death of both neurons and synapses. To be effective, however, this must be accomplished while maintaining the current memories.
If you do some searching on google with the keywords 'neurogenesis artificial neural networks', or similar, you will find more articles. There is also this similar question at cogsci.stackexchange.com.

neat networks as well as cascading add their own connections/neurons to solve problems by building structures to create specific responses to stimuli

Dual neural networks experiment (one logical, one emotional)?

Seeing that as as far as we know, one half of your brain is logical and the other half of your brain is emotional, and that the wants of the emotional side are fed to the logical side in order to fulfill those wants; has there been any research done in connecting two separate neural networks to one another (one trained to be emotional, and one trained to be logical) to see if it would result in almost a free-will sort of "brain"?
I don't really know anything about neural networks except that they were modeled after the biological synapses in the human brain, which is why I ask.
I'm not even sure if this would be possible considering that even a trained neural network sometimes doesn't act logically (a.k.a. do what you thought you trained it to do).

First, most modern neural networks aren't really modeled after biological synapses. They use an Artificial Neuron which allowed Back Propagation to work rather than a Perceptron which is a much more accurate representation.
When you feed the output of one network into the input of another network, you've really just created one larger network, not two separate networks. It just happens that in this case portions of the networks would be trained independently.
That said, all neural networks have to be trained. Which means you need sample input and sample output. You are looking to create a decision engine of sorts I suppose. So you would need to create a dataset where it makes sense that there might be an emotional and rational response, such as purchasing an item. You'd have to train the 'rational' network to accept as a set of inputs the output of an 'emotional' network. Which means you are really just training the rational decision engine to be responsive based on the leve of 'distress' caused by the emotional network.
Just my two cents.

I have also heard of one hemisphere being called "divergent" and one "convergent". This may not make any more sense than emotional vs logical, but it does hint at how you might model it more easily. I don't know how the brain achieves some of the impressive computational feats it does, but I wouldn't be very surprised if all revolved around balance, but maybe that is just one of the baises you have when you are a brain with two hemipheres (or any even number) :D
A balance between convergence and divergence is the crux of the creativity inherent in evolution. Replicating this with neural nets sounds promising to me. Suppose you make one learning system that generalizes and keeps representations of only the typical groups of patterns it is shown. Then you take another and make it generate all the in-betweens and mutants of the patterns it is shown. Then you feed them to eachother in a circle, and poof, you have made something really interesting!

It's even more complex than that, unbelievably. The left hemisphere works on a set of logical rules, it uses these to predict its environment and categorize input. It also infers rules and stores them for future use. The right hemisphere is based, as you said, on emotion, but also on memory of single, unique or emotionally relevant occurrences. A software implementation should also be able to retrieve and store these two data types and exchange "opinions" about them.

While the left hemisphere of the brain may be more involved in making emotional decisions, emotion itself is unlikely to occur exclusively in one side of the brain, and the interplay between emotions and rational thought within the brain is likely to be substantially more complex than having two completely separate circuits. For instance, a study on rhesus macaques found that dopamine and other hormones associated with emotional responses essentially implements temporal difference learning within the brain (I'm still looking for a link to it). This suggests that separating emotional and rational thought into two separate neural networks probably wouldn't be practical, even if we had the resources to build neural networks on the scale of brain hemispheres (which we don't, or at least not within most research budgets).
This idea is supported by Sloman and Croucher's suggestion that emotion will likely be an unavoidable emergent property of a sufficiently advanced intelligent system. Such systems (discussed in detail in the paper) will be much more complex than straight-up neural nets. More importantly, though, the emotions won't be something that you can localize to one part of the system.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse