Markov chain to predict banking frauds - complex-event-processing

I want to develop a system for detecting and preventing bank transactions fraud using complex events processing. Ive been researching and looks like markov chain could help me with this.
What are the general steps and data flow to develop such system?
Im not looking for complete answers, just the generic steps, so i can research and obtain my own answers.
Given this scenario what exactly can i predict with markov chain?
I know its not a specific question but im not looking for specific answers

you need full insight into all possible parameters and their dependencies. For every parameter, you will require one dimension, and the calculations will become rather big. there are other algorithms that are much simpler and more elegant.
Markov chains are more suited for fever parameters. For scenarios like this you should try other methods like "Support vector machines".
Also have a look intomore elegant solutions like Benford's Law

Related

neural network for sudoku solver

I recently started learning neural networks, and I thought that creating a sudoku solver would be a nice application for NN. I started learning them with backward propagation neural network, but later I figured that there are tens of neural networks. At this point, I find it hard to learn all of them and then pick an appropriate one for my purpose. Hence, I am asking what would be a good choice for creating this solver. Can back propagation NN work here? If not, can you explain why and tell me which one can work.
Thanks!
Neural networks don't really seem to be the best way to solve sudoku, as others have already pointed out. I think a better (but also not really good/efficient) way would be to use an genetic algorithm. Genetic algorithms don't directly relate to NNs but its very useful to know how they work.
Better (with better i mean more likely to be sussessful and probably better for you to learn something new) ideas would include:
If you use a library:
Play around with the networks, try to train them to different datasets, maybe random numbers and see what you get and how you have to tune the parameters to get better results.
Try to write an image generator. I wrote a few of them and they are stil my favourite projects, with one of them i used backprop to teach a NN what x/y coordinate of the image has which color, and the other aproach combines random generated images with ine another (GAN/NEAT).
Try to use create a movie (series of images) of the network learning to create a picture. It will show you very well how backprop works and what parameter tuning does to the results and how it changes how the network gets to the result.
If you are not using a library:
Try to solve easy problems, one after the other. Use backprop or a genetic algorithm for training (whatever you have implemented).
Try to improove your implementation and change some things that nobody else cares about and see how it changes the results.
List of 'tasks' for your Network:
XOR (basically the hello world of NN)
Pole balancing problem
Simple games like pong
More complex games like flappy bird, agar.io etc.
Choose more problems that you find interesting, maybe you are into image recognition, maybe text, audio, who knows. Think of something you can/would like to be able to do and find a way to make you computer do it for you.
It's not advisable to only use your own NN implemetation, since it will probably not work properly the first few times and you'll get frustratet. Experiment with librarys and your own implementation.
Good way to find almost endless resources:
Use google search and add 'filetype:pdf' in the end in order to only show pdf files. Search for neural network, genetic algorithm, evolutional neural network.
Neither neural nets not GAs are close to ideal solutions for Sudoku. I would advise to look into Constraint Programming (eg. the Choco or Gecode solver). See https://gist.github.com/marioosh/9188179 for example. Should solve any 9x9 sudoku in a matter of milliseconds (the daily Sudokus of "Le monde" journal are created using this type of technology BTW).
There is also a famous "Dancing links" algorithm for this problem by Knuth that works very well https://en.wikipedia.org/wiki/Dancing_Links
Just like was mentioned in the comments, you probably want to take a look at convolutional networks. You basically input the sudoku bord as an two dimensional 'image'. I think using a receptive field of 3x3 would be quite interesting, and I don't really think you need more than one filter.
The harder thing is normalization: the numbers 1-9 don't have an underlying relation in sudoku, you could easily replace them by A-I for example. So they are categories, not numbers. However, one-hot encoding every output would mean a lot of inputs, so i'd stick to numerical normalization (1=0.1, 2 = 0.2, etc.)
The output of your network should be a softmax with of some kind: if you don't use softmax, and instead outupt just an x and y coordinate, then you can't assure that the outputedd square has not been filled in yet.
A numerical value should be passed along with the output, to show what number the network wants to fill in.
As PLEXATIC mentionned, neural-nets aren't really well suited for these kind of task. Genetic algorithm sounds good indeed.
However, if you still want to stick with neural-nets you could have a look at https://github.com/Kyubyong/sudoku. As answered Thomas W, 3x3 looks nice.
If you don't want to deal with CNN, you could find some answers here as well. https://www.kaggle.com/dithyrambe/neural-nets-as-sudoku-solvers

When(why) to use step/pulse/ramp functions in simulink?

Hello guys I'd like to know the answer to the question that is the titled named by.
For example if I have physical system described in differential equation(s), how should I know when I should use step, pulse or ramp generator?
What exactly does it do?
Thank you for your answers.
They are mostly the remnants of the classical control times. The main reason why they are so famous is because of their simple Laplace transform terms. 1,1/s and 1/s^2. Then you can multiply these with the plant and you would get the Laplace transform of the output.
Back in the day, what you only had was partial fraction expansion and Laplace transform tables to get an idea what the response would look like. And today, you can basically simulate whatever input you like. So they are not really neeeded which is the answer to your question.
But since people used these signals so often they have spotted certain properties. For example, step response is good for assessing the transients and the steady state tracking error value, ramp response is good for assessing (reference) following error (which introduces double integrators) and so on. Hence, some consider these signals as the characteristic functions though it is far from the truth. Especially, you should keep in mind that, just because the these responses are OK, the system is not necessarily stable.
However, keep in mind that these are extremely primitive ways of assessing the system. Currently, they are taught because they are good for giving homeworks and making people acquainted with Simulink etc.
They are used to determine system characteristics. If you are studying a system of differential equations you would want to know different characteristics from the response of the system from these kinds of inputs since these inputs are the very fundamental ones. For example a system whose output blows up for a pulse input is unstable, and you would not want to have such a system in real life(except in rare situations). It's too difficult for me to explain it all in an answer, you should start with this wiki page.

Adding Interaction Terms to MATLAB Multiple Regression

I am currently running a multiple linear regression using MATLAB's LinearModel.fit function, and I am bit confused in regards to how to properly add interaction terms to the model by hand. As I am aware, LinearModel.fit does not standardize variables on its own, so I have been doing so manually.
So far, the way I have done it has been to
Standardize the observations for each variables
Multiply corresponding standardized values from specific variables to create the interaction terms and then add these new variables to the set of regression data
Run the regression
Is this the correct way to go about doing this? Should I standardize the interaction term variables also after calculating the 'raw' terms? Any help would be greatly appreciated!
Whether or not to standardize interaction terms probably depends on what you intend to do with the model. Standardization typically does not affect model performance as much as it allows for more straightforward model interpretation as your learned coefficients will be on similar scales. I suspect whether to do this or not is largely a matter of opinion. Here is a relevant stats.stackexchange post that may help.
My intuition would be the same as how you have described your process so far.

Backward elimination technique in MATLAB

I am a student at statistics department and I have a thesis about the factors of daily life behaviors related to obesity.
I made a test to 200 people and asked 30 questions like, if they smoke or not & fast-food consumption etc...
My question is ; How can i find the significant variables in which are mostly related to obesity situation using backward elimination or forward selection technique in MATLAB.
I am new at MATLAB and don't have any idea about where to start. Could somebody please help me.
If you have access to Statistics Toolbox, take a look at the functions stepwisefit and sequentialfs. Both carry out forms of forward and backward feature selection. stepwisefit does stepwise linear regression, whereas sequentialfs is for general purpose sequential feature selection applicable to many model types.

How do I decide which Neural Network and learning method to use in a particular case?

I am new in neural networks and I need to determine the pattern among a given set of inputs and outputs. So how do I decide which neural network to use for training or even which learning method to use? I have little idea about the pattern or relation between the given input and outputs.
Any sort of help will be appreciated. If you want me to read some stuff then it would be great if links are provided.
If any more info is needed plz say so.
Thanks.
Choosing the right neural networks is something of an art form. It's a bit difficult to give generic suggestions as the best NN for a situation will depend on the problem at hand. As with many of these problems neural netowrks may or may not be the best solution. I'd highly recommned trying out different networks and testing their performance vs a testing data set. When I did this I usually used the ANN tools though the R software package.
Also keep your mind open to other statistical learning techniques as well, things like decision trees and Support Vector Machines may be a better choice for some problems.
I'd suggest the following books:
http://www.amazon.com/Neural-Networks-Pattern-Recognition-Christopher/dp/0198538642
http://www.stats.ox.ac.uk/~ripley/PRbook/#Contents