Need a method to store a lot of data in Matlab - matlab

I've asked this before, but I feel I wasn't clear enough so I'll try again.
I am running a network simulation, and I have several hundreds output files. Each file holds the simulation's test result for different parameters.
There are 5 different parameters and 16 different tests for each simulation. I need a method to store all this information (and again, there's a lot of it) in Matlab with the purpose of plotting graphs using a script. suppose the script input is parameter_1 and test_2, so I get a graph where parameter_1 is the X axis and test_2 is the Y axis.
My problem is that I'm not quite familier to Matlab, and I need to be directed so it doesn't take me forever (I'm short on time).
How do I store this information in Matlab? I was thinking of two options:
Each output file is imported separately to a different variable (matrix)
All output files are merged to one output file and imprted together. In the resulted matrix each line is a different output file, and each column is a different test. Problem is, I don't know how to store the simulation parameters
Edit: maybe I can use a dataset?
So, I would appreciate any suggestion of how to store the information, and what functions might help me fetch the only the data I need.

If you're still looking to give matlab a try with this problem, you can iterate through all the files and import them one by one. You can create a list of the contents of a folder with the function
ls(name)
and you can import data like this:
A = importdata(filename)
if your data is in txt files, you should consider this Prev Q
A good strategy to avoid cluttering your workspace is to import them all into a single matrix. SO if you have a matrix called VAR, then VAR{1,1}.{1,1} could be where you put your test results and VAR{1,1}.{2,1} could be where you put your simulation parameters of the first file. I think that is simpler than making a data structure. Just make sure you uniformly place the information in the same indexes of the arrays. You could also organize your VAR row v col by parameter vs test.
This is more along the lines of your first suggestion
Each output file is imported separately to a different variable
(matrix)
Your second suggestion seems unnecessary since you can just iterate through your files.

You can use the command save to store your data.
It is very convenient, and can store as much data as your hard disk can bear.
The documentation is there:
http://www.mathworks.fr/help/techdoc/ref/save.html

Describe the format of text files. Because if it has a systematic format then you can use dlmread or similar commands in matlab and read the text file in a matrix. From there, you can plot easily. If you try to do it in excel, it will be much slower than reading from a text file. If speed is an issue for you, I suggest that you don't go for Excel.

Related

Can automatically enumerate figures or keep tokens in matlab?

In a live script in matlab, I plot multiple figures, and I use this code to enumerate the figures:
FigureQuantity=1
plot(data_1)
title('Figure '+string(FigureQuantity))
Then on another code section I do it again
FigureQuantity=FigureQuantity+1
plot(data_n)
title('Figure '+string(FigureQuantity))
The problem is that if I run the last code section again, FigureQuantity gets updated and the enumeration of figures gets broken.
There is any way to get the number of tokens ordered by his code appearance on the live script? (independent of how many times the section code is run)
I would like to keep tokens so I can mix inserted images and plots. And I want to export the document as PDF (not to show plots in an application or an independent window).
What I need is something like MS Word enumeration of figures and tables.
I found this Matlab documentation: Number Section Headings, Table Titles, and Figure Captions Programmatically, but it appears to be used for creation of MS Word or HTML documents, and not to enumerate images on Matlab live scripts.
I do not understand how to use it, or if that is his purpose on Matlab.
I'm assuming you're updating the data_n variable live as well; otherwise, if you're defining these variables manually then not doing so for the figure variables isn't really the solution I think you're looking for.
Why not for-loop through the figure updates?
for FigureQuantity = 1:numberOfFigureQuantities
figure(FigureQuantity);
hold on;
plot(data_n(FigureQuantity))
title(strcat('Figure Number: ',num2str(FigureQuantity)));
end
The figure count corresponding to the FigureQuantity will index the appropriate figure and will update that figure if it already existed. This is the solution I think you're looking for; if not, please clarify.

How can I import ground truth data into Matlab for the training of a (faster) R-CNN?

I have a large, labelled, dataset which I have created and I would like to provide it to Matlab to train an R-CNN (using the faster R-CNN algorithm).
How can this be done?
The built-in labeller provided by Matlab requires that the user manually load each data sample and label it with a graphical user interface.
This is not practical for me as the set is already labelled and it contains 500,000 samples.
It should be noted, that I can control the format in which the data set is stored. So, I can create .csv files or excel files if needed.
I have tried two directions:
1. Creating a mat file, similar to the one created by the labeller.
2. Looked for ways within Matlab to import the data from .csv or excel files.
I have had no success with either methods.
For Direction 1:
Though there are many libraries that can open mat files, they are not able to open or create files similar to the Matlab ground truths because these are not simple matrices (the cells themselves contain matrices of varying dimensions that represent the bounding boxes of each classified object). Moreover, though the Matlab Level 5 file format is open source I have not been successful in using it to write my own code (C# or C++) to parse and write such files.
For Direction 2:
There are generic methods in Matlab to load .csv and excel files but I do not know how to organize these files in such a way as to produce the structure that the labeller creates and that is consumed by the fasterRCNN trainer.

running NN software with my own data

New with Matlab.
When I try to load my own date using the NN pattern recognition app window, I can load the source data, but not the target (it is never on the drop down list). Both source and target are in the same directory. Source is 5000 observations with 400 vars per observation and target can take on 10 different values (recognizing digits). Any Ideas?
Before you do anything with your own data you might want to try out the example data sets available in the toolbox. That should make many problems easier to find later on because they definitely work, so you can see what's wrong with your code.
Regarding your actual question: Without more details, e.g. what your matrices contain and what their dimensions are, it's hard to help you. In your case some of the problems mentioned here might be similar to yours:
http://www.mathworks.com/matlabcentral/answers/17531-problem-with-targets-in-nprtool
From what I understand about nprtool your targets have to consist of a matrix with only one 1 (for the correct class) in either row or column (depending on the input matrix), so make sure that's the case.

using mapreduce programming technique in matlab

I am studying rat ultrasonic vocalisations (their speech in ultrasound). I have several audio wav files of the rats speeches. Ideally, I would import the whole file into matlab and just process it but I will get memory issues even with the smallest 70mb file. This is what I want help with.
[y, Fs, nbits] = audioread('T0000201.wav');
[S F T] = spectrogram(y,100,[],256,Fs,'yaxis');
..
..
..rest of program
I could consider breaking the audio (in one file) into blocks, and process the block before considering the next block, but I'm not sure what I would do for cases where rat calls are cut off half way through, at the end of the blocks (this might have a negative impact on the STFT spectrogram).
I came across another technique called "Mapreduce" which seems to allow me to use the entirety of my data without actually reading it in. While this seems most ideal, I don't quite understand how it works or can be implemented. "Hadoop" has also been mentioned. Can anyone provide any assistance?
I am currently using this (http://uk.mathworks.com/help/matlab/import_export/find-maximum-value-with-mapreduce.html) for reference. My first step was trying to use the wav file as the data store (like the csv file in the example) but that didn't work.
Since you're working primarily with a repository of audio (.wav) files, mapreduce might not be your best option. The datastore function only works with text files or key-value files.
Use the memory function to explore what the limits of memory are for MATLAB, and try processing the audio files in smaller blocks as you mentioned. Using a combination of audioread(), audioinfo(), and audiowrite(), you can break your collection of audio files up into a larger collection of smaller files that can then be individually processed.
If you have a small number of files to work with, then you can manually inspect the smaller blocks to make sure no important rat calls are cut off between blocks. Of course if you have thousands of files to work with then that approach won't be feasible.

How could I write the following functions in Matlab for MS protein analysis?

I need your help. I have more than 40000 proteins in fasta file format.
First I want to write a function:
that is able to calculate the masses of the b- and y-ions
that creates a peptide database from the target proteins (mat-file)
that creates a peptide database coming from the decoy proteins (mat-file)
Then, I want to:
load the observed data
filter the peptide databases for candidate peptides given a certain ppm accuracy
write a function that scores the candidate peptides against the observed data
Come up with a thresholding scheme to discern bonafide peptide spectrum matches from the bogus ones
To get started with, FASTA is text file format. To write text files check MATLAB documentation of fopen, fprintf and fclose. To load the text from the data files you've written you can use fopen, fscanf and fclose. Actually, MATLAB has fastainfo, fastaread and fastawrite too. You should check MATLAB documentation of these commands and of other FASTA-related and protein analysis related commands which could be useful for you (I haven't done protein analysis, so I can't say which are the ones you'll need).
Further, you are asking so many things in one question that it's not possible to answer them all, because your question IMHO is kind of "How I write my entire program?". But I think that you could get started with the commands I have listed, and when you have some code written and a well-defined problem that you've attempted to solve yourself, then you could post a new question about it, with the relevant parts of your code.
MATLAB's Bioinformatics Toolbox contains building block routines that you can put together to achieve this. If you have a specific problem when putting them together, post the specific question.