Single Neuron Neural Network - Types of Questions? [closed] - neural-network

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Can anybody think of a real(ish) world example of a problem that can be solved by a single neuron neural network? I'm trying to think of a trivial example to help introduce the concepts.

Using a single neuron to classification is basically logistic regression, as Gordon pointed out.
Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more metric (interval or ratio scale) independent variables. (statisticssolutions)
This is a good case to apply logistic regression:
Suppose that we are interested in the factors that influence whether a political candidate wins an election. The outcome (response) variable is binary (0/1); win or lose. The predictor variables of interest are the amount of money spent on the campaign, the amount of time spent campaigning negatively and whether or not the candidate is an incumbent. (ats)

For a single neuron network, I find solving logic functions a good example. Assuming say a sigmoid neuron, you can demonstrate how the network solves AND and OR functions, which are linearly sepparable and how it fails to solve the XOR function which is not.

Related

Which activation function should be used at intermediate Layers in a Time Series Prediction Task [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I want to build a Time series prediction model using LSTM,
Which activation function should be used at intermediate layers?
Is Linear activation function is good for Final or Output Layer?
I am normalising my input data in range (0, 1) and inverse normalise after prediction.
Here is My Model:
model = Sequential()
model.add(LSTM(32, input_shape=(input_n, n_features),return_sequences=True,activation='relu'))
model.add(LSTM(32, input_shape=(n_features, input_n), return_sequences=True,activation='relu'))
model.add(Dense(output_n))
model.add(Activation("linear"))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')
model.summary()
Here I have used 'relu' in intermediate layers and Linear activation at my output layer.
Is this approach correct, or in the intermediate layer I should also try with tanh and sigmoid.
What will happen if I will not use any activation function in the intermediate layer, will LSTM take care of this.
Actually LSTM already having tanh and sigmoid activation function for its internal gate calculation.
Word of warning: this is my subjective impression which is mostly (but not completely backed) by scientific research.
I can verify that ReLU and its derivates (PReLU, Leaky ReLU, etc.) have produced the best results for me in the past.
Which of those implementations will produce the best results for you is probably best determined by trying them out, if you can afford to do so.
ReLU is prettymuch better for deep learning models as an activation function.This normalizes your input and output in the range of [0,1] and adds non linearity

In this case, what's better: classification or clustering? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I collected data from different sources FB, Twitter, Linkedin, then made them in a structured format. As a result now: I'm having a csv file with 10000 rows (10000 person) and the data associated is about their names, age,their interests and buying habits.
I'm really stuck on this step: CLASSIFICATION or CLUSTERING. For the classification I don't really have predefined classes or a model for my users to classify them.
For clustering: I started calculating similarities and KMeans, but still can't get the result I wanted. How can I decide what to choose before moving on to the next step of Collaborative filtering?
Foremost, you have to understand that clustering is a pre-processing activity/task. The idea in clustering is to identify objects with similar properties and group them. The clustering process can be understood in terms of cattle-herding. Wherein the jockey herds loose cattle (read data points) into groups.
Note: If you are looking at the partitioning clustering algorithm family includes K-means, k-modes, k-prototype etc. The algorithm k-means will work only for numerical data. K-modes will work only for categorical data and k-prototype will work for both numerical and categorical data.
Question: Is the data preprocessed? If the answer is no, then you may try the following steps;
Is the data (column values) all categorical (=text) format or numerical or mixed?
a. If all categorical then discretize or bin or interval scale them.
b. if mixed, then discretize or bin or interval scale the categorical values only
c. Perform missing value and outlier treatment for both numerical and categorical data. This will help in retaining maximum variance as well as reduce dimensionality.
d. Normalize the numerical values to a median of zero.
Now apply a suitable clustering algorithm (based on your problem) to determine patterns. Once you have found the patterns, then you may label them. Once the identified patterns are labelled, thereafter or subsequently a classification algorithm can be used to classify any new incoming data points into an appropriate class.

Backpropagation Optimization: How do I use the derivatives for optimizing the weights and biases? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Given the derivative of the cost function with respect to the weights or biases of the neurons of a neural network, how do I adjust these neurons to minimize the cost function?
Do I just subtract the derivative multiplied by a constant off of the individual weight and bias? If constants are involved how do I know what is reasonable to pick?
Your right about how to perform the update. This is what is done in gradient descent in its various forms. Learning rates (the constant you are referring to) are generally very small 1e-6 - 1e-8. There are numerous articles on the web covering both of these concepts.
In the interest of a direct answer though, it is good to start out with a small learning rate (on the order suggested above), and check that the loss is decreasing (via plotting). If the loss decreases, you can raise the learning rate a bit. I recommend to raise it by 3x its current value. For example, if it is 1e-6, raise it to 3e-6 and check again that your loss is still decreasing. Keep doing this until the loss is no longer decreasing nicely. This image should give some nice intuition on how learning rates affect the loss curve (image comes from Stanford's cs231n lecture series)
You want to raise the learning rate so that the model doesn't take as long to train. You don't want to raise the learning rate too much because then it is possible to overshoot the local minimum you're descending towards and for the loss to increase (the yellow curve above). This is an oversimplification because the loss landscape of a neural network is very non-convex, but this is the general intuition.

Questions about word embedding(word2vec) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am trying to understand word2vec(word embedding) architecture, and I have few questions about it:
first, why is word2vec model considered a log-linear model? Is it because it uses a soft max at output layer?
second, why does word2vec remove hidden layer? Is it just because of computational complexity?
third, why does word2vec not use activation function? (as compared to NNLM(Neural Network Language Model).
first, why word2vec model is log-linear model? because it uses a soft max at output layer?
Exactly, softmax is a log-linear classification model. The intent is to obtain values at the output that can be considered a posterior probability distribution
second, why word2vec removes hidden layer? it just because of
computational complexity?
third, why word2ved don't use activation function? compare for
NNLM(Neural Network Language Model).
I think your second and third question are linked in the sense that an extra hidden layer and an activation function would make the model more complex than necessary. Note that while no activation is explicitly formulated, we could consider it to be a linear classification function. It appears that the dependencies that the word2vec models try to model can be achieved with a linear relation between the input words.
Adding a non-linear activation function allows the neural network to map more complex functions, which could in turn lead to fit the input onto something more complex that doesn't retain the dependencies word2vec seeks.
Also note that linear outputs don't saturate which facilitates gradient-based learning.

Can I use neural network in this case? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
can I use neural networks or svm or etc, if my output data is 27680 that all of them are zero and just one of them is one?
I mean that Is it right to do this?
when I use SVM I have this error:
Error using seqminopt>seqminoptImpl (line 198)
No convergence achieved within maximum number of iterations.
SVMs are usually binary classifiers. Basically that means that they seperate your datapoints into two groups, which signals whether a datapoint does or doesn't belong to a class. Common strategies for solving multi-class problems with SVMs are one-vs-rest and one-vs-one. In the case of one-vs-rest, you would train one classifier per class, which would be 27,680 for you. In the case of one-vs-one, you would train (K over 2) = (K(K-1))/2 classifiers, so in your case around 38 million. As you can see, both numbers are rather high, so I would be pessimistic about your probability of successfully solving your problem with SVMs.
Nevertheless you can try to increase the maximum amount iterations as described in another stackoverflow thread. Maybe it still works.
You can use Neural Nets for your task and a 1-of-K output is nothing unusual. However, even with only one hidden layer of 500 neurons (and using the input and output vector sizes mentioned in your comment) you will have (27680*2*500) + (500*27680) = 41,520,000 weights in your network. So I would expect rather long training times (although a Google employee would probably laugh about these numbers). You will also most likely need a lot of training examples, unless your input is really simple.
As an alternative you might look into Decision Trees/Random Forests, Naive Bayes or kNN.