Fuzzy c-means tcp dump clustering in matlab - matlab

Hi I have some data thats represented like this:
0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.
Its from the kdd cup 1999 which was based on the darpa set.
the text file I have has rows and rows of data like this, in matlab there is the generic clustering tool you can use by typing findcluster but it only accepts .dat files.
Im also not very sure if it will accept the format like this. Im also not sure why there is so many trailing zeros in the dump files.
Can anyone help how I can utilise the text document and run it thru a fcm clustering method in matlab? Code help is really needed.

FINDCLUSTER is simply a GUI interface for two clustering algorithms: FCM and SUBCLUST
You first need to read the data from file, look into the TEXTSCAN function for that.
Then you need to deal with non-numeric attributes; either remove them or convert them somehow. As far as I can tell, the two algorithms mentioned only support numeric data.
Visit the original website of the KDD cup dataset to find out the description of each attribute.

Related

How to extract digits (number) using Matlab

At work I have to record a lot of data from png data. Every time I have to manually record the digits (e.g. mean\SD 101.1\11) on the excel sheet and read it with Matlab. Would it be possible that Matlab could directly read the digits from the PNG image, so that lots of work could be saved?
I know it might involve pattern recognition, but still hope that there may be someone who has done this before.
You can make use of Optical Character Recognition (OCR). The code for it is available here

Generating txt file in complex format from Matlab datas

I'm relatively new to Matlab and currently using it to calculate pressure cards for rapid dynamic applications on RADIOSS.
The function is done and can calculate Time-Pressure points.
For the moment I generated only .ascii files to import as curves into the software but I'd like to directly write a text file readable by RADIOSS. (after conversion)
The formatting I need is very specific and I'd like to know if such a thing is possible to do on Matlab. I've been searching on my own for some time now and didn't find really specific formatting options so I come seeking for your advice.
For example I have n time Arrays Te{1 to n} an n Pressure Arrays Pr{1 to n} the format needed is presented in the image linked. How can it be done if it is possible ?
The sprintf function is quite powerful and should provide all the facilities you need. Having looked at the image you linked, I don't see anything particularly special.

Need a method to store a lot of data in Matlab

I've asked this before, but I feel I wasn't clear enough so I'll try again.
I am running a network simulation, and I have several hundreds output files. Each file holds the simulation's test result for different parameters.
There are 5 different parameters and 16 different tests for each simulation. I need a method to store all this information (and again, there's a lot of it) in Matlab with the purpose of plotting graphs using a script. suppose the script input is parameter_1 and test_2, so I get a graph where parameter_1 is the X axis and test_2 is the Y axis.
My problem is that I'm not quite familier to Matlab, and I need to be directed so it doesn't take me forever (I'm short on time).
How do I store this information in Matlab? I was thinking of two options:
Each output file is imported separately to a different variable (matrix)
All output files are merged to one output file and imprted together. In the resulted matrix each line is a different output file, and each column is a different test. Problem is, I don't know how to store the simulation parameters
Edit: maybe I can use a dataset?
So, I would appreciate any suggestion of how to store the information, and what functions might help me fetch the only the data I need.
If you're still looking to give matlab a try with this problem, you can iterate through all the files and import them one by one. You can create a list of the contents of a folder with the function
ls(name)
and you can import data like this:
A = importdata(filename)
if your data is in txt files, you should consider this Prev Q
A good strategy to avoid cluttering your workspace is to import them all into a single matrix. SO if you have a matrix called VAR, then VAR{1,1}.{1,1} could be where you put your test results and VAR{1,1}.{2,1} could be where you put your simulation parameters of the first file. I think that is simpler than making a data structure. Just make sure you uniformly place the information in the same indexes of the arrays. You could also organize your VAR row v col by parameter vs test.
This is more along the lines of your first suggestion
Each output file is imported separately to a different variable
(matrix)
Your second suggestion seems unnecessary since you can just iterate through your files.
You can use the command save to store your data.
It is very convenient, and can store as much data as your hard disk can bear.
The documentation is there:
http://www.mathworks.fr/help/techdoc/ref/save.html
Describe the format of text files. Because if it has a systematic format then you can use dlmread or similar commands in matlab and read the text file in a matrix. From there, you can plot easily. If you try to do it in excel, it will be much slower than reading from a text file. If speed is an issue for you, I suggest that you don't go for Excel.

How could I write the following functions in Matlab for MS protein analysis?

I need your help. I have more than 40000 proteins in fasta file format.
First I want to write a function:
that is able to calculate the masses of the b- and y-ions
that creates a peptide database from the target proteins (mat-file)
that creates a peptide database coming from the decoy proteins (mat-file)
Then, I want to:
load the observed data
filter the peptide databases for candidate peptides given a certain ppm accuracy
write a function that scores the candidate peptides against the observed data
Come up with a thresholding scheme to discern bonafide peptide spectrum matches from the bogus ones
To get started with, FASTA is text file format. To write text files check MATLAB documentation of fopen, fprintf and fclose. To load the text from the data files you've written you can use fopen, fscanf and fclose. Actually, MATLAB has fastainfo, fastaread and fastawrite too. You should check MATLAB documentation of these commands and of other FASTA-related and protein analysis related commands which could be useful for you (I haven't done protein analysis, so I can't say which are the ones you'll need).
Further, you are asking so many things in one question that it's not possible to answer them all, because your question IMHO is kind of "How I write my entire program?". But I think that you could get started with the commands I have listed, and when you have some code written and a well-defined problem that you've attempted to solve yourself, then you could post a new question about it, with the relevant parts of your code.
MATLAB's Bioinformatics Toolbox contains building block routines that you can put together to achieve this. If you have a specific problem when putting them together, post the specific question.

Sample data for testing binary linear classificaion code

I am loking for some sample binary data for testing my linear classifiation code. I need a data set where the data is 2d and belongs to either one of two classes. If anyone has such data or any reference for the same, kindly reply. Any help is appreciated.
I have my own dataset which contain 2 categories of data with 2 features each.
http://dl.dropbox.com/u/28068989/segmentation_mi_kit.zip
Extract this archive and go to 'segmentation_mi_kit/mango_banyan_dataset/'
Alternately if you want something standard to test your algorithm on, have a look at UCI Machine Learning dataset : http://www.ics.uci.edu/~mlearn/
I guess thatz a kind of data you need.