Degrees of Freedom for Linear Mixed Model Likelihood Ratio Test Statistic - covariance

I am studying LMM and am confused about the degrees of freedom associated with testing covariance parameters on the boundaries of the parameter space using restricted maximum likelihood, REML.
Suppose I have a model built from a 2-level clustered dataset and have 3 random effects, but am testing to see if only 1 of them is zero. What would be the degrees of freedom for this test?
Also, am looking for some background information on this, but not anything too lengthy.
I tried looking up information online but couldn't find anything direct or specific. Am not trying to read a full book on this.

Related

Appropriate method for clustering ordinal variables

I was reading through all (or most) previously asked questions, but couldn't find an answer to my problem...
I have 13 variables measured on an ordinal scale (thy represent knowledge transfer channels), which I want to cluster (HCA) for a following binary logistic regression analysis (including all 13 variables is not possible due to sample size of N=208). A Factor Analysis seems inappropriate due to the scale level. I am using SPSS (but tried R as well).
Questions:
1: Am I right in using the Chi-Squared measure for count data instead of the (squared) euclidian distance?
2. How can I justify a choice of method? I tried single, complete, Ward and average, but all give different results and I can't find a source to base my decision on.
Thanks a lot in advance!
Answer 1: Since the variables are on ordinal scale, the chi-square test is an appropriate measurement test. Because, "A Chi-square test is designed to analyze categorical data. That means that the data has been counted and divided into categories. It will not work with parametric or continuous data (such as height in inches)." Reference.
Again, ordinal scaled data is essentially count or frequency data you can use regular parametric statistics: mean, standard deviation, etc Or non-parametric tests like ANOVA or Mann-Whitney U test to compare 2 groups or Kruskal–Wallis H test to compare three or more groups.
Answer 2: In a clustering problem, the choice of distance method solely depends upon the type of variables. I recommend you to read these detailed posts 1, 2,3

IBM Bluemix - Visual Recognition. Why low scores?

I’m using Visual Recognition service on IBM Bluemix.
I have created some classifiers, in particular two of these with this objective:
first: a “generic” classifier that has to return the score of confidence about the recognition of a particular object in the image. I’ve trained it with 50 positive examples of the object, and 50 negative examples of something similar to the object (details of it, its components, images alike it etc.).
second: a more specific classifier that recognize the particular type of the object identified before, if the score of the first classification is quite high. This new classifier has been trained as the first one: 50 positive examples of type A object, 50 negative examples of type B object. This second categorization should be more specific that the first one, because the images are more detailed and are all similar among them.
The result is that the two classifiers work well, and the expected results of a particular set of images correspond to the truth in most cases, and this should mean that both have been well trained.
But there is a thing that I don’t understand.
In both classifiers, if I try to classify one of the images that have been used in the positive training set, my expectation is that the confidence score should be near to 90-100%. Instead, I always obtain a score that is included in the range between 0.50 and 0.55. Same thing happens when I try with an image very similar to one of the positive training set (scaled, reflected, cut out etc.): the confidence never goes above 0.55 circa.
I’ve tried to create a similar classifier with 100 positive images and 100 negative images, but the final result never change.
The question is: why the confidence score is so low? why it is not near to 90-100% with images used in the positive training set?
The scores from Visual Recognition custom classifiers range from 0.0 to 1.0, but they are unitless and are not percentages or probabilities. (They do not add up to 100% or 1.0)
When the service creates a classifier from your examples, it is trying to figure out what distinguishes the features of one class of positive_examples from the other classes of positive_examples (and negative_examples, if given). The scores are based on the distance to a decision boundary between the positive examples for the class and everything else in the classifier. It attempts to calibrate the score output for each class so that 0.5 is a decent decision threshold, to say whether something belongs to the class.
However, given the cost-benefit balance of false alarms vs. missed detections in your application, you may want to use a higher or lower threshold for deciding whether an image belongs to a class.
Without knowing the specifics of your class examples, I might guess that there is a significant amount of similarity between your classes, that maybe in the feature space your examples are not in distinct clusters, and that the scores reflect this closeness to the boundary.

can use more than ten inputs with a single layer neural network to separate into two categories

I have a pattern data with 12 categories. And I want to separate these data into two categories. So Can anyone tell me is it possible to do with a single layer neural network with 12 input values with the bias term? And also I implemented it with matlab but i'm having some doubt what should be the best initial weight values(range) and possible learning rate? can you please guide me on these cases.
Is a single layer enough?
Whether a single hidden layer suffices to correctly label your input data depends on the complexity of your data. You should empirically try different topologies (combinations of layers and number of neurons) until you discover a setting that works for you.
What are the best weight ranges?
The recommended weight ranges depends on the activation function you intend to use. For the sigmoid function, the range is a small interval centered around 0, eg: [-0.1, 0.1]
What is the ideal learning rate?
The learning rate often set to a small value such as 0.03, but if the data is easily learned by your network you can often increase the rate drastically eg: 0.3. Check out this discussion on how learning rates affect the learning process: https://stackoverflow.com/a/11415434/1149632
A side note
You should search the Web for a few pointers and tips, and rather post more to the point questions on StackOverflow.
Check this out:
http://www.willamette.edu/~gorr/classes/cs449/intro.html

K means Analysis on KDD Cup Dataset 99

What kind of knowledge/ inference can be made from k means clustering analysis of KDDcup99 dataset?
We ploted some graphs using matlab they looks like this:::
Experiment 1: Plot of dst_host_count vs serror_rate
Experiment 2: Plot of srv_count vs srv_serror_rate
Experiment 3: Plot of count vs serror_rate
I just extracted saome features from kddcup data set and ploted them.....
The main problem am facing is due to lack of domain knowledge I cant determine what inference can be drawn form this graphs another one is if I have chosen wrong axis then what should be the correct chosen feature?
I got very less time to complete this thing so I don't understand the backgrounds very well
Any help telling the interpretation of these graphs would be helpful
What kind of unsupervised learning can be made using this data and plots?
Just to give you some domain knowledge: the KDD cup data set contains information about different aspects of network connections. Each sample contains 'connection duration', 'protocol used', 'source/destination byte size' and many other features that describes one connection connection. Now, some of these connections are malicious. The malicious samples have their unique 'fingerprint' (unique combination of different feature values) that separates them from good ones.
What kind of knowledge/ inference can be made from k means clustering analysis of KDDcup99 dataset?
You can try k-means clustering to initially cluster the normal and bad connections. Also, the bad connections falls into 4 main categories themselves. So, you can try k = 5, where one cluster will capture the good ones and other 4 the 4 malicious ones. Look at the first section of the tasks page for details.
You can also check if some dimensions in your data set have high correlation. If so, then you can use something like PCA to reduce some dimensions. Look at the full list of features. After PCA, your data will have a simpler representation (with less number of dimensions) and might give better performance.
What should be the correct chosen feature?
This is hard to tell. Currently data is very high dimensional, so I don't think trying to visualize 2/3 of the dimensions in a graph will give you a good heuristics on what dimensions to choose. I would suggest
Use all the dimensions for for training and testing the model. This will give you a measure of the best performance.
Then try removing one dimension at a time to see how much the performance is affected. For example, you remove the dimension 'srv_serror_rate' from your data and the model performance comes out to be almost the same. Then you know this dimension is not giving you any important info about the problem at hand.
Repeat step two until you can't find any dimension that can be removed without hurting performance.

Lucas Kanade Optical Flow, Direction Vector

I am working on optical flow, and based on the lecture notes here and some samples on the Internet, I wrote this Python code.
All code and sample images are there as well. For small displacements of around 4-5 pixels, the direction of vector calculated seems to be fine, but the magnitude of the vector is too small (that's why I had to multiply u,v by 3 before plotting them).
Is this because of the limitation of the algorithm, or error in the code? The lecture note shared above also says that motion needs to be small "u, v are less than 1 pixel", maybe that's why. What is the reason for this limitation?
#belisarius says "LK uses a first order approximation, and so (u,v) should be ideally << 1, if not, higher order terms dominate the behavior and you are toast. ".
A standard conclusion from the optical flow constraint equation (OFCE, slide 5 of your reference), is that "your motion should be less than a pixel, less higher order terms kill you". While technically true, you can overcome this in practice using larger averaging windows. This requires that you do sane statistics, i.e. not pure least square means, as suggested in the slides. Equally fast computations, and far superior results can be achieved by Tikhonov regularization. This necessitates setting a tuning value(the Tikhonov constant). This can be done as a global constant, or letting it be adjusted to local information in the image (such as the Shi-Tomasi confidence, aka structure tensor determinant).
Note that this does not replace the need for multi-scale approaches in order to deal with larger motions. It may extend the range a bit for what any single scale can deal with.
Implementations, visualizations and code is available in tutorial format here, albeit in Matlab not Python.