In this case, what's better: classification or clustering? [closed] - classification

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I collected data from different sources FB, Twitter, Linkedin, then made them in a structured format. As a result now: I'm having a csv file with 10000 rows (10000 person) and the data associated is about their names, age,their interests and buying habits.
I'm really stuck on this step: CLASSIFICATION or CLUSTERING. For the classification I don't really have predefined classes or a model for my users to classify them.
For clustering: I started calculating similarities and KMeans, but still can't get the result I wanted. How can I decide what to choose before moving on to the next step of Collaborative filtering?

Foremost, you have to understand that clustering is a pre-processing activity/task. The idea in clustering is to identify objects with similar properties and group them. The clustering process can be understood in terms of cattle-herding. Wherein the jockey herds loose cattle (read data points) into groups.
Note: If you are looking at the partitioning clustering algorithm family includes K-means, k-modes, k-prototype etc. The algorithm k-means will work only for numerical data. K-modes will work only for categorical data and k-prototype will work for both numerical and categorical data.
Question: Is the data preprocessed? If the answer is no, then you may try the following steps;
Is the data (column values) all categorical (=text) format or numerical or mixed?
a. If all categorical then discretize or bin or interval scale them.
b. if mixed, then discretize or bin or interval scale the categorical values only
c. Perform missing value and outlier treatment for both numerical and categorical data. This will help in retaining maximum variance as well as reduce dimensionality.
d. Normalize the numerical values to a median of zero.
Now apply a suitable clustering algorithm (based on your problem) to determine patterns. Once you have found the patterns, then you may label them. Once the identified patterns are labelled, thereafter or subsequently a classification algorithm can be used to classify any new incoming data points into an appropriate class.

Related

Caffe CNN: diversity of filters within a conv layer [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have the following theoretical questions regarding the conv layer in a CNN. Imagine a conv layer with 6 filters (conv1 layer and its 6 filters in the figure).
1) what guarantees the diversity of learned filters within a conv layer? (I mean, how the learning (optimization process) makes sure that it does not learned the same (similar) filters?
2) diversity of filters within a conv layer is a good thing or not? Is there any research on this?
3) during the learning (optimization process), is there any interaction between the filters of the same layer? if yes, how?
1.
Assuming you are training your net with SGD (or a similar backprop variant) the fact that the weights are initialized at random encourage them to be diverse, since the gradient w.r.t loss for each different random filter is usually different the gradient will "pull" the weights in different directions resulting with diverse filters.
However, there is nothing that guarantees diversity. In fact, sometimes filters become tied to each other (see GrOWL and references therein) or drop to zero.
2.
Of course you want your filters to be as diverse as possible to capture all sorts of different aspects of your data. Suppose your first layer will only have filters responding to vertical edges, how is your net going to cope with classes containing horizontal edges (or other types of textures)?
Moreover, if you have several filters that are the same, why computing the same responses twice? This is highly inefficient.
3.
Using "out-of-the-box" optimizers, the learned filters of each layer are independent of each other (linearity of gradient). However, one can use more sophisticated loss functions/regularization methods to make them dependent.
For instance, using group Lasso regularization, can force some of the filters to zero while keeping the others informative.

k-means clustering for Testing data classification [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I want to do k-means clustering to classify Testing data based on Training data both of which have 3 classes (1,2 and 3).
How would I classify the Testing data set using a cluster size of e.g. k=10 in kmeans (e.g. using Matlab)? I know that I can have k=3 and then use nearest neighbour to identify the data based on its nearest cluster size... but not sure what I would use for values other that k=3? How would you label each of those 10 clusters?
Thanks
The classification of 10 clusters would be no different than the classification of 3 clusters. The number of clusters given by k-means is independent of the number of "classes" in the data. k-means is an unsupervised learning algorithm, meaning that it gives no consideration to the class of the training data during training.
The algorithm would look something like this:
distances = dist(test_point, cluster_centers)
cluster = clusters[ min(distances) ]
class = mode(cluster.class)
where we find the cluster with minimum distance between the cluster center and our test point, then we find the most common class label among the elements contained in that minimally-distant cluster.
It is a little bit unclear what exactly you want to do, although here is an outline from what I understand.
When you are clustering data, the labels are ideally not present, as either you use the clustering to get insights from the data or use it for pre-processing.
Although, if you want to perform a clustering and then assign class id to a new datapoint based on the nearness of the cluster centers, then you can do the following.
First, you select the k by bootstrapping or other methods, maybe use Silhouette coefficients. Once you get the cluster centers, check which center is closest to the new datapoint and assign the class id accordingly.
In such cases you might be interested to use the Rand Index or the Adjusted Rand Index, to get the cluster quality.

Single Neuron Neural Network - Types of Questions? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Can anybody think of a real(ish) world example of a problem that can be solved by a single neuron neural network? I'm trying to think of a trivial example to help introduce the concepts.
Using a single neuron to classification is basically logistic regression, as Gordon pointed out.
Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more metric (interval or ratio scale) independent variables. (statisticssolutions)
This is a good case to apply logistic regression:
Suppose that we are interested in the factors that influence whether a political candidate wins an election. The outcome (response) variable is binary (0/1); win or lose. The predictor variables of interest are the amount of money spent on the campaign, the amount of time spent campaigning negatively and whether or not the candidate is an incumbent. (ats)
For a single neuron network, I find solving logic functions a good example. Assuming say a sigmoid neuron, you can demonstrate how the network solves AND and OR functions, which are linearly sepparable and how it fails to solve the XOR function which is not.

Combining an image classifier and an expert system

Would it be accurate to include an expert system in an image classifying application? (I am working with Matlab, have some experience with image processing and no experience with expert systems.)
What I'm planning on doing is adding an extra feature vector that is actually an answer to a question. Is this fine?
For example: Assume I have two questions that I want the answers to : Question 1 and Question 2. Knowing the answers to these 2 questions should help classify the test image more accurately. I understand expert systems are coded differently from an image classifier but my question is would it be wrong to include the answers to these 2 questions, in a numerical form (1 can be yes, and 0 can be no) and pass this information along with the other feature vectors into a classifier.
If it matters, my current classifier is an SVM.
Regarding training images: yes, they too will be trained with the 2 extra feature vectors.
Converting a set of comments to an answer:
A similar question in cross-validated already explains that it can be done as long as data is properly preprocessed.
In short: you can combine them as long as training (and testing) data is properly preprocessed (e.g. standardized). Standardization improves the performance of most linear classifiers because it scales the variables so they have the similar weight in the learning process and improves the numerical stability (and performance) when variables are sampled from gaussian-like distributions (which is achieved by standarization).
With that, if continuous variables are standardized and categorical variables are encoded as (-1, +1) the SVM should work well. Whether it will improve or not the performance of the classifier depends on the quality of those cathegorical variables.
Answering the other question in the comment.. while using kernel SVM with for example a chi square kernel, the rows of the training data are suppose to behave like histograms (all positive and usually l1-normalized) and therefore introducing a (-1, +1) feature breaks the kernel. Using a RBF kernel the rows of the data are suppose to be L2 normalized, and again, introducing (-1, +1) features might introduce unexpected behaviour (I'm not very sure what exactly the effect would be..).
I worked on similar problem. if multiple features can be extracted from your images then you can train different classifier by using different features. You can think about these classifiers as experts in answering questions based on the features they used in training. Instead of using labels as outputs, it is better to use confidence values. uncertainty can be very important in this manner. you can use these experts to generate values. these values can be combined and used to train another classifier.

LibSVM: -wi option (weight selection) during cross-validation and testing [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I need help about the weight option of libSVM. I'm confused at some point; should we also use the -wi option while doing cross-validation? If so, should we use the calculated weights of the whole data or the calculated weights according to the v-1 subsets (for v-fold cross-validation)? And my second question is should we use -wi option during predict? If so, should we use the calculated weights during training or should we calculate the weights according to the distribution of negative and positive instances in the test data?
For example; we have 50 + data and 200 - data. So after calculating the best c and gamma parameter values we will use -w1 4 -w-1 1 options while training. But what about training during grid search and cross validation? Let's say we are performing 5-fold cross-validation. While training on each remaining 4 subsets, the distribution of negative and possitive instances will probably change. So should we recalculate the weights during this 5-fold cross validation?
Besides shoud we use -w1 4 -w-1 1 options while testing?
Thanks
To answer your first question, if you are applying non-trivial weights to a subset of classes during model training, then you should do the same throughout your training/tuning which includes cross-validation based tuning of C and gamma (otherwise you would be tuning the model based on the cost-sensitive objective/risk/loss function which is different from the one you are actualy specifying by enabling non-trivial class weights)
The class weights are external to libSVM in the sense that they are not calculated by libSVM - that command-line option allows a user to set his/her own class weights to emphasize/reduce importance of a subset of classes. Some people tune the class weights as well but that is a different story.
As for the prediction, the class weights are not used there explicitly (since they come in as a "tweak" to the objective/risk/loss function during the model training/tuning stage so the resulting model is already "aware" of the weights)