running NN software with my own data - matlab

New with Matlab.
When I try to load my own date using the NN pattern recognition app window, I can load the source data, but not the target (it is never on the drop down list). Both source and target are in the same directory. Source is 5000 observations with 400 vars per observation and target can take on 10 different values (recognizing digits). Any Ideas?

Before you do anything with your own data you might want to try out the example data sets available in the toolbox. That should make many problems easier to find later on because they definitely work, so you can see what's wrong with your code.
Regarding your actual question: Without more details, e.g. what your matrices contain and what their dimensions are, it's hard to help you. In your case some of the problems mentioned here might be similar to yours:
http://www.mathworks.com/matlabcentral/answers/17531-problem-with-targets-in-nprtool
From what I understand about nprtool your targets have to consist of a matrix with only one 1 (for the correct class) in either row or column (depending on the input matrix), so make sure that's the case.

Related

How to plot all the stream lines in paraview?

I am simulating the case "Cavity driven lid" and I try to get all the stream lines with the stream tracer of paraview, but I only get the ones that intersect the reference line, and because of that there are vortices that are not visible. How can I see all the stream-lines in the domain?
Thanks a lot in adavance.
To add a little bit to Mathieu's answer, if you really want streamlines everywhere, then you can create a Stream Tracer With Custom Source (as Mathieu suggested) and set your data to both the Input and the Seed Source. That will create a streamline originating from every point in your dataset, which is pretty much what you asked for.
However, while you can do this, you will probably not be happy with the results. First of all, unless your data is trivially small, this will take a long time to compute and create a large amount of data. Even worse, the result will be so dense that you won't be able to see anything. You will get all those interesting streamlines through vortices, but they will be completely hidden by all the boring streamlines around them.
Thus, you are better off with trying to derive a data set that contains seed points that are likely to trace a stream through the vortices that you are interested in. One thing you might want to try is to compute the vorticity of your vector field (Gradient Of Unstructured Data Set when turning on advanced option Compute Vorticity), find the magnitude of that (Calculator), and then use the Threshold filter to pull out the cells with large vorticity. Then use that as your Seed Source.
Another (probably better) option if your data is 2D or you can extract an interesting surface along the flow of your data is to use the Surface LIC plugin. Details can be found at https://www.paraview.org/Wiki/ParaView/Line_Integral_Convolution.
You have to choose a representative source for your streamline.
You could use a "Sphere Source", so in the StreamTracer properties.
If that fails, you can use a StreamTracerWithCustomSource and use your own source that you will have to create yourself first.

How to automatically optimize a classifier in Weka in order to have a given class to contain 100 % sure data?

I have two (or three) classes and each classes can only possess one label.
I want to optimize (automatically if possible) parameters and thresholds of classifiers in order for my first class to contain only 100 % sure data. Even if it contains a small number of instances.
I don't mind for the remaining classes to contain false alarm or correct rejection.
I don't mind to have unclassified data.
I have already been searching on stackoverflow and on the weka's wiki but maybe my lack of knowledge concerning weka made me miss some keywords.
I also tried to perform the task with the well-known "iris" database but I think that in this case, any class can be 100 % sure.
Yet, I have only succeed in testing multiple classifiers and tuning them manually but without performing 100 % correct for my first class. (I checked this result in the confusion matrix given by weka's report.)
Somehow, I know it is possible for my class to contain 100% sure data because I managed to do it in Matlab with simple threshold set manually. But I would like to try out a bigger database, to obtain better threshold and to use the power of weka.
Any suggestions would be helpful, thanks !
You probably need the "Cost Sensitive Classifier" among "meta" classifiers.
If you are working in the Explorer, here is the dialog you get.
Choose the your "classifier" (something beyond ZeroR :) ).
Set your "cost matrix". For 2-class problem this will be 2x2 matrix.
By setting one non-diagonal component very large (>>1, let us say 1000) you ensure that misclassifying one class (your "first" class) is 1000 times more expensive than misclassifying another class. This should do the job.

Bootstrapping in Matlab - how many original data points are used?

I have data sets for two groups, with one being much smaller than the other. For that reason, I am using the MatLab bootstrapping function to estimate the performance of the smaller group. I have code that draws on my original data, and it generates 1000 'new' means. However, it is not clear as to how many of the original data points are used each time. Obviously, if all the original data was used, the same mean would continue to be generated.
Can anyone help me out with this?
Bootstrapping comes from sampling with replacement. You'll use the same number of points as the original data, but some of them will be repeated. There are some variants of bootstrapping which work slightly differently, however. See https://en.wikipedia.org/wiki/Bootstrapping_(statistics).

How to deal with missing values in Kruskal-Wallis test in Matlab?

The Matlab documentation seems unclear about how to ignore missing data when using kruskalwallis, the Kruskal-Wallis (or any other related) test. The same goes for unequal group size.
Very late answer, but I ran into the same problem myself today, might as well help some future searcher.
The solution is pretty straightforward. kruskalwallis is primarily used on matrices and by default compares equal-sized columns, but it does allow you to instead assign groups manually, with the optional variable "group". I was attempting to check whether a single value was unlikely to belong to a distribution from a different set, so this was straightforward. I just added the value I wanted to test on to the end of the set I was testing against, then made "group" a vector of ones the same size as the set, with a "2" added to the end for the new value. Looks like it worked quite nicely.
For numeric data, the the standard missing data value in Matlab is NaN. See ismissing. See also this article from The MathWorks. For tables, you might find standardizeMissing helpful as well as replaceWithMissing for dataset objects. I can't say anything about group size.

Need a method to store a lot of data in Matlab

I've asked this before, but I feel I wasn't clear enough so I'll try again.
I am running a network simulation, and I have several hundreds output files. Each file holds the simulation's test result for different parameters.
There are 5 different parameters and 16 different tests for each simulation. I need a method to store all this information (and again, there's a lot of it) in Matlab with the purpose of plotting graphs using a script. suppose the script input is parameter_1 and test_2, so I get a graph where parameter_1 is the X axis and test_2 is the Y axis.
My problem is that I'm not quite familier to Matlab, and I need to be directed so it doesn't take me forever (I'm short on time).
How do I store this information in Matlab? I was thinking of two options:
Each output file is imported separately to a different variable (matrix)
All output files are merged to one output file and imprted together. In the resulted matrix each line is a different output file, and each column is a different test. Problem is, I don't know how to store the simulation parameters
Edit: maybe I can use a dataset?
So, I would appreciate any suggestion of how to store the information, and what functions might help me fetch the only the data I need.
If you're still looking to give matlab a try with this problem, you can iterate through all the files and import them one by one. You can create a list of the contents of a folder with the function
ls(name)
and you can import data like this:
A = importdata(filename)
if your data is in txt files, you should consider this Prev Q
A good strategy to avoid cluttering your workspace is to import them all into a single matrix. SO if you have a matrix called VAR, then VAR{1,1}.{1,1} could be where you put your test results and VAR{1,1}.{2,1} could be where you put your simulation parameters of the first file. I think that is simpler than making a data structure. Just make sure you uniformly place the information in the same indexes of the arrays. You could also organize your VAR row v col by parameter vs test.
This is more along the lines of your first suggestion
Each output file is imported separately to a different variable
(matrix)
Your second suggestion seems unnecessary since you can just iterate through your files.
You can use the command save to store your data.
It is very convenient, and can store as much data as your hard disk can bear.
The documentation is there:
http://www.mathworks.fr/help/techdoc/ref/save.html
Describe the format of text files. Because if it has a systematic format then you can use dlmread or similar commands in matlab and read the text file in a matrix. From there, you can plot easily. If you try to do it in excel, it will be much slower than reading from a text file. If speed is an issue for you, I suggest that you don't go for Excel.