Why one independent variable gets dropped in SPSS multiple regression? - linear-regression

I was running a linear multiple regression as well as a logistic multiple regression in SPSS.
After that when looking at the results, I realised that in each regression, one independent variable was automatically excluded by SPSS. Did we do something wrong here or what do we have to do in order to have all independent variables included in the regression?
thanks for your help!

In LOGISTIC REGRESSION you can specify a predictor variable as categorical, and thus make use of a single categorical variable. If you do this, you can specify the type of contrast coding to use and choose which category would be used as the reference category.
The REGRESSION procedure doesn't have facilities for declaring predictors categorical, so if you have an intercept or constant in the model (which of course is the default) and you try to enter K dummy or indicator variables for a K-level categorical variable, one of them will be linearly dependent on the intercept and the other K-1 dummies, and as Kevin said, will be left out.
You may be confused by which one the procedure leaves out. When the default ENTER method is used in REGRESSION, it tries to enter all the independent variables, but does it one variable at a time and performs a tolerance check to avoid numerical problems associated with redundant or nearly redundant variables. The way it does this results in the last predictor specified being the first to enter, and after that, the variable with the smallest absolute correlation with that previously entered variable, then the remaining candidate variable with the least relationship with the two entered already, etc. In cases where there are ties on tolerance values, the later listed variable gets entered.

Related

The single predictor variable in the best-fitting model with 1-predictor must also be selected in the best-fitting model with 2-predictors

In best subset selection, we know the algorithm is selecting smallest residual sum of squares in every Moddels.
The question is whether The single predictor variable in the best-fitting model with 1-predictor must also be selected in the best-fitting model with 2-predictors.
I see it is true in many real case, but according to the algorithm it seems not the case.
I was wondering if there is any statistical inference shows that it is true.

When I should use transformation of inputs variable in neural network?

I have run ANN in matlab for prediction a variable based on several response variables.ALL variables have numerical values.I could not get a desirable results although I changed hidden neuron several times many runs of the model and so on.My question is should I use transformation of the input variables to get a better results?how can I know that which transformation I should choos?Thanks for any help.
I strongly advise you to use some methods from time series analysis like lagged correlation or window lagged correlation (with statistical tests). You can find it in most of statistical packages (e.g. in R). From one small picture it's hard to deduce whether your prediction is lagged or not. Testing huge amount of data can help you in revealing true dependencies and avoid trusting in spurious correlations.

sigmoid - back propagation neural network

I'm trying to create a sample neural network that can be used for credit scoring. Since this is a complicated structure for me, i'm trying to learn them small first.
I created a network using back propagation - input layer (2 nodes), 1 hidden layer (2 nodes +1 bias), output layer (1 node), which makes use of sigmoid as activation function for all layers. I'm trying to test it first using a^2+b2^2=c^2 which means my input would be a and b, and the target output would be c.
My problem is that my input and target output values are real numbers which can range from (-/infty, +/infty). So when I'm passing these values to my network, my error function would be something like (target- network output). Would that be correct or accurate? In the sense that I'm getting the difference between the network output (which is ranged from 0 to 1) and the target output (which is a large number).
I've read that the solution would be to normalise first, but I'm not really sure how to do this. Should i normalise both the input and target output values before feeding them to the network? What normalisation function is best to use cause I read different methods in normalising. After getting the optimized weights and use them to test some data, Im getting an output value between 0 and 1 because of the sigmoid function. Should i revert the computed values to the un-normalized/original form/value? Or should i only normalise the target output and not the input values? This really got me stuck for weeks as I'm not getting the desired outcome and not sure how to incorporate the normalisation idea in my training algorithm and testing..
Thank you very much!!
So to answer your questions :
Sigmoid function is squashing its input to interval (0, 1). It's usually useful in classification task because you can interpret its output as a probability of a certain class. Your network performes regression task (you need to approximate real valued function) - so it's better to set a linear function as an activation from your last hidden layer (in your case also first :) ).
I would advise you not to use sigmoid function as an activation function in your hidden layers. It's much better to use tanh or relu nolinearities. The detailed explaination (as well as some useful tips if you want to keep sigmoid as your activation) might be found here.
It's also important to understand that architecture of your network is not suitable for a task which you are trying to solve. You can learn a little bit of what different networks might learn here.
In case of normalization : the main reason why you should normalize your data is to not giving any spourius prior knowledge to your network. Consider two variables : age and income. First one varies from e.g. 5 to 90. Second one varies from e.g. 1000 to 100000. The mean absolute value is much bigger for income than for age so due to linear tranformations in your model - ANN is treating income as more important at the beginning of your training (because of random initialization). Now consider that you are trying to solve a task where you need to classify if a person given has grey hair :) Is income truly more important variable for this task?
There are a lot of rules of thumb on how you should normalize your input data. One is to squash all inputs to [0, 1] interval. Another is to make every variable to have mean = 0 and sd = 1. I usually use second method when the distribiution of a given variable is similiar to Normal Distribiution and first - in other cases.
When it comes to normalize the output it's usually also useful to normalize it when you are solving regression task (especially in multiple regression case) but it's not so crucial as in input case.
You should remember to keep parameters needed to restore the original size of your inputs and outputs. You should also remember to compute them only on a training set and apply it on both training, test and validation sets.

Logistic Regression with variables that do not vary

A few questions around constant variables and logistic regression -
Lets say I have a continuous variable, but has only 1 value across the whole data set. I know I should ideally eliminate the variable since it brings no predictive value. Instead of manually doing this for each feature, does Logistic Regression make the coefficient of such variables 0 automatically?
If I use such a variable (that has only one value) in Logistic Regression with L1 regularization, will the regularization force the coefficient to 0?
On similar lines, if I have a categorical variable for which I have 3 levels - first level spans say 60% of the data set, second spans across 35% and the 3rd level at 5%), and I split it into training and testing, there is a good chance that the third level may not end up in the test set, leading us a scenario where we have a variable that has one value in the test set and other in the training set. How do I handle such scenarios ? Does regularization take care of things like this automatically?
ND
Regarding question 3)
If you want to be sure that both train and test set contain samples from each categorical variables, you can simply divide each subgroup into test and training set and then combine these again.
Regarding question 1) and 2)
The coefficent for a variable with variance zero should be zero, yes. However, whether such a coefficent "automatically" will be set to zero or be excluded from regression depends on the implementation.
If you implement logistic regression yourself, you can post the code and we can discuss specifically.
I recommend you to find an implemented version of logistic regression and test it using toy data. Then you will have your answer, whether or not the coeffient will be set to zero (which i assume).

Solve multiple equal ODE systems that influence each other

I'm using MATLABs ODE suite to solve a system of 13 differential equations that determine the behavior of a neuron. Now I'd like to add a second neuron, that is operated by the same set of differential equations, but is influenced by the first neuron. More importantly, this second neuron will also influence the first neuron. (Feedforward and feedback between these two cells.)
Is there a convenient way to do this? Can I distribute the differential equations over two function files or do I have to copy them below the original ones, such that there is a longer list of equations in the same file? I'd like to be able to have one file per cell and to somehow keep this organized. (Also in case I might want to expand it again to three or four neurons.)
If my question is in any way unclear or not specific enough, please indicate. I'll try and explain what I'm doing/trying better.
You have to split the big vector of all variables into the sub-arrays of the variables of each neuron, call each file with its variables and then concatenate the resulting direction vectors.
If the neurons behave similarly, you should think about having one method (file) with a loop inside for the neuron function-internal part of the direction and after that probably a double loop for the interaction terms. Put the connectivity information into a data structure to be flexible in changing it.
I have little experience with MATLAB, but one way that I've seen this done in MATLAB is by creating a list (1D matrix?) for each state variable that requires storage. For example, implementing the Hodgkin-Huxley neuron would require a matrix each for gating variables 'm', 'h', and 'n', as well as one for 'V'. Each list is as long as the number of neurons in the simulations. Then make the ith position in the list corresponds to the ith neuron.
The flow of the simulation would look like the following (let N be the number of neurons):
For each time step in the simulation,
1) let 'index = 1'
2) call the system of ODEs in your file using, as arguments, the 1st element from each list/matrix of state variables.
3) add one to the index. If the index is now greater than N, then move the timestep forward by one and start over at (1).
It sounds like you'll also need matrices to store information about the influence of one another. While I know a lot of people do it this way, it seems cumbersome on a larger scale (especially if you ever incorporate neurons with different sets of ODE's). In the long run, I highly recommend migrating to a more object-oriented approach. They should provide a way easier method to 'bind' each instance of a neuron with its variables and equations, and creating any number of neurons will require no additional code.
http://www.mathworks.com/discovery/object-oriented-programming.html