I have a question about the chi-square test.
I have two between-subject factors, each with two levels (so 4 conditions). Furthermore, I have one dependent variable (qualitative), also consisting of two levels.
Now I want to make pairwise comparisons (so I have 6 chi-sqaure test in total). Is there any way I can control type-1-errors? In the literature I saw they often calculated interaction with a chi-sqaure test. Is this the way to do it, and if so, how do I do it?
I can work with both SPSS and Matlab.
Thank in advance!
Niels
Related
Dears,
I want to understand if I using not properly the CP-SAT algorithm. Basically my code automatically creates a model reading a csv with a dataset. My code creates model.NewBoolVar() for each record of the dataset multiplied for the number of possible decisions to be taken by the optimization problem...
For example if I have a dataset with 1 Milion of records and I have to decide between 3 options, the model will contains 3 Milions of boolean variables. The combination of the 3 Milions of booleans is the solution to my optimizzation problem.
Currently after 100K variables the program is becoming unstable and python crashes. Do you think that I'm trying to use CP-SAT not properly? Do you have experience with this kind of volumes?
Thank you very much.
Cheers
You are aware that this is an NP problem.
Thus potentially, you are creating a search tree of size 2^3000000000.
I have two (or three) classes and each classes can only possess one label.
I want to optimize (automatically if possible) parameters and thresholds of classifiers in order for my first class to contain only 100 % sure data. Even if it contains a small number of instances.
I don't mind for the remaining classes to contain false alarm or correct rejection.
I don't mind to have unclassified data.
I have already been searching on stackoverflow and on the weka's wiki but maybe my lack of knowledge concerning weka made me miss some keywords.
I also tried to perform the task with the well-known "iris" database but I think that in this case, any class can be 100 % sure.
Yet, I have only succeed in testing multiple classifiers and tuning them manually but without performing 100 % correct for my first class. (I checked this result in the confusion matrix given by weka's report.)
Somehow, I know it is possible for my class to contain 100% sure data because I managed to do it in Matlab with simple threshold set manually. But I would like to try out a bigger database, to obtain better threshold and to use the power of weka.
Any suggestions would be helpful, thanks !
You probably need the "Cost Sensitive Classifier" among "meta" classifiers.
If you are working in the Explorer, here is the dialog you get.
Choose the your "classifier" (something beyond ZeroR :) ).
Set your "cost matrix". For 2-class problem this will be 2x2 matrix.
By setting one non-diagonal component very large (>>1, let us say 1000) you ensure that misclassifying one class (your "first" class) is 1000 times more expensive than misclassifying another class. This should do the job.
I'm new to Dymola and I have to implement a chemical reactor in Dymola.
I modeled the behaviour of the reactor in 3 different models, because the reactor behaves different depending on a variable x. So that model a is valid for x<=0.1, model 2 is valid for 0.75>x<0.1 and model 3 is valid for x>0.75. Is there any way to run only one of the three models in each simulation step? I have looked into the "if" statement to put all 3 model equations in one model, but that didn't work. Is there anyone out there who can help me? Any hint would be great! Thank you!
Modelica does not handle variable structure problems. What this means is that the set of variables cannot change during the simulation.
Most people who are trying to solve such systems typically find a way to keep all variables present but somehow "deactivate" different sets by switching equations (which can, to some extent, change during the simulation).
If you give a bit more information about the types of models you need to switch between, I could try to give you some hints about how to "deactivate" them from one phase to another.
I have several datasets i.e. matrices that have a 2 columns, one with a matlab date number and a second one with a double value. Here an example set of one of them
>> S20_EavesN0x2DEAir(1:20,:)
ans =
1.0e+05 *
7.345016409722222 0.000189375000000
7.345016618055555 0.000181875000000
7.345016833333333 0.000177500000000
7.345017041666667 0.000172500000000
7.345017256944445 0.000168750000000
7.345017465277778 0.000166875000000
7.345017680555555 0.000164375000000
7.345017888888889 0.000162500000000
7.345018104166667 0.000161250000000
7.345018312500001 0.000160625000000
7.345018527777778 0.000158750000000
7.345018736111110 0.000160000000000
7.345018951388888 0.000159375000000
7.345019159722222 0.000159375000000
7.345019375000000 0.000160625000000
7.345019583333333 0.000161875000000
7.345019798611111 0.000162500000000
7.345020006944444 0.000161875000000
7.345020222222222 0.000160625000000
7.345020430555556 0.000160000000000
Now that I have those different sensor values, I need to get them together into a matrix, so that I could perform clustering, neural net and so on, the only problem is, that the sensor data was taken with slightly different timings or timestamps and there is nothing I can do about that from a data collection point of view.
My first thought was interpolation to make one sensor data set fit another one, but that seems like a messy approach and I was thinking maybe I am missing something, a toolbox or function that would enable me to do this quicker without me fiddling around. To even complicate things more, the number of sensors grew over time, therefore I am looking at different start dates as well.
Someone a good idea on how to go about this? Thanks
I think your first thought about interpolation was the correct one, at least if you plan to use NNs. Another option would be to use approaches which are designed to deal with missing data, like http://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory for example.
It's hard to give an answer for the clustering part, because I have no idea what you're looking for in the data.
For the neural network, beside interpolating there are at least two other methods that come to mind:
training separate networks for each matrix
feeding them all together to the same network, with a flag specifying which matrix the data is coming from, i.e. something like: input (timestamp, flag_m1, flag_m2, ..., flag_mN) => target (value) where the flag_m* columns are mutually exclusive boolean values - i.e. flag_mK is 1 iff the line comes from matrix K, 0 otherwise.
These are the only things I can safely say with the amount of information you provided.
I'm currently writing an optimization algorithm in MATLAB, at which I completely suck, therefore I could really use your help. I'm really struggling to find a good way of representing a graph (or well more like a tree with several roots) which would look more or less like this:
alt text http://img100.imageshack.us/img100/3232/graphe.png
Basically 11/12/13 are our roots (stage 0), 2x is stage1, 3x stage2 and 4x stage3. As you can see nodes from stageX are only connected to several nodes from stage(X+1) (so they don't have to be connected to all of them).
Important: each node has to hold several values (at least 3-4), one will be it's number and at least two other variables (which will be used to optimize the decisions).
I do have a simple representation using matrices but it's really hard to maintain, so I was wondering is there a good way to do it?
Second question: when I'm done with that representation I need to calculate how good each route (from roots to the end) is (like let's say I need to compare is 11-21-31-41 the best or is 11-21-31-42 better) to do that I will be using the variables that each node holds. But the values will have to be calculated recursively, let's say we start at 11 but to calcultate how good 11-21-31-41 is we first need to go to 41, do some calculations, go to 31, do some calculations, go to 21 do some calculations and then we can calculate 11 using all the previous calculations. Same with 11-21-31-42 (we start with 42 then 31->21->11). I need to check all the possible routes that way. And here's the question, how to do it? Maybe a BFS/DFS? But I'm not quite sure how to store all the results.
Those are some lengthy questions, but I hope I'm not asking you for doing my homework (as I got all the algorithms, it's just that I'm not really good at matlab and my teacher wouldn't let me to do it in java).
Granted, it may not be the most efficient solution, but if you have access to Matlab 2008+, you can define a node class to represent your graph.
The Matlab documentation has a nice example on linked lists, which you can use as a template.
Basically, a node would have a property 'linksTo', which points to the index of the node it links to, and a method to calculate the cost of each of the links (possibly with some additional property that describe each link). Then, all you need is a function that moves down each link, and brings the cost(s) with it when it moves back up.