Classification in R using Naive Bayes - naivebayes

I would like to know how to implement Naive Bayes classifier in R.
I am stuck at the following :
How to split data file into training and test data if file is in csv format, has 8 variables and more than 700 rows.

Related

How to train large dataset for classification in MATLAB

I have a large features dataset of around 111 Mb for classification with 217000 data points and each point has 1760000 features point. When used in training with SVM in MATLAB, it takes a lot of time.
How can be this data processed in MATLAB.
It depends on what sort of SVM you are building.
As a rule of thumb, with such big feature sets you need to look at linear classifiers, such as an SVM with no/the linear kernel, or logistic regression with various regularizations etc.
If you're training an SVM with a Gaussian kernel, the training algorithm has O(max(n,d) min (n,d)^2) complexity, where n is the number of examples and d the number of features. In your case it ends up being O(dn^2) which is quite big.

Kmeans clustering on large dataset in Matlab

I have epinions dataset that has 290000 columns and 22166 rows.
Its large dataset and 340 MB , when I open this mat file it take about 30 min to load it on Matlab and when I run my clustering code its impossible to give me any output, all my problem is the big size of dataset, I test it with R programming language and it was same, is there any solution to compress dataset ? What can I do now ? Thanks

Multi-class classification for large database (matlab)

Can you suggest any implementation (matlab) of Multi-class classification algorithm for large database, I tried libsvm it's good except for large database and for the liblinear I can't use it for the multi classification
If you want to use liblinear for multi class classification, you can use one vs all technique. For more information Look at this.
But if you have large database then use of SVM is not recommended. As Run time complexity of SVM is O(N * N * m)
N = number of samples in data
m = number of features in data
So, alternatively You can use Neural Network. You can start with nntool available in MATLAB.

Lip Reading classification on LiLiR dataset in matlab

i work on lip reading but i am a newbie.
after googling, i found that one of the data set of lip reading is LiLir dataset. now i downloaded it and i want to classify them using Support vector machine (SVM). but each letter has a matrix of data which has 4800 rows and 21 until 28 columns. i do not know what is the meaning of columns. they are features, but which features?
A1_Faye_lips = load('\data set\avletters\avletters\Lips\A1_Faye-lips.mat')
A1_Faye_lips =
vid: [4800x21 double]
siz: [60 80 21]
>>
how can i train SVM using this 2D matrix?
21 is the feature, I didn't look into the data source, so cannot tell what are those features exactly . But they are possibly the independent variables that influence the output (lip reading). Each variable is 4800*1 vector or 60*80 array.
For data training, Libsvm is a good SVM training toolbox for you.

Good results with NN, not with SVM; cause for concern?

I have painstakingly gathered data for a proof-of-concept study I am performing. The data consists of 40 different subjects, each with 12 parameters measured at 60 time intervals and 1 output parameter being 0 or 1. So I am building a binary classifier.
I knew beforehand that there is a non-linear relation between the input-parameters and the output so a simple perceptron of Bayes classifier would be unable to classify the sample. This assumption proved correct after initial tests.
Therefore I went to neural networks and as I hoped the results were pretty good. An error of about 1-5% is generally the result. The training is done by using 70% as training and 30% as evaluation. Running the complete dataset again (100%) through the model I was very happy with the results. The following is a typical confusion matrix (P = positive, N = negative):
P N
P 13 2
N 3 42
So I am happy and with the notion that I used a 30% for evaluation I am confident that I am not fitting noise.
Therefore I resolved to SVM for a double check and the SVM was unable to converge to a good solution. Most of the time the solutions are terrible (say 90% error...). Maybe I am not fully aware of SVM's or the implementations are not correct, but it troubles me because I thought that when NN provide a good solution, SVM's are most of the time better in seperating the data due to their maximum-margin hyperplane.
What does this say of my result? Am I fitting noise? And how do I know if this is a correct result?
I am using Encog for the calculations but the NN results are comparable to home-grown NN models I made.
If it is your first time to use SVM, I strongly recommend you to take a look at A Practical Guide to Support Vector Classication, by authors of a famous SVM package libsvm. It gives a list of suggestions to train your SVM classifier.
Transform data to the format of an SVM package
Conduct simple scaling on the data
Consider the RBF kernel
Use cross-validation to nd the best parameter C and γ
Use the best parameter C and γ
to train the whole training set
Test
In short, try scaling your data and carefully choosing the kernal plus the parameters.