User Classification in RapidMiner - output should be the user based on a fed test data - classification

How can I use RapidMiner to run the classifier on a test data, and classify a user based on that data - I need it to actually output who the classified user is, and not its performance. Any help would be greatly appreciated.

I found the answer to my question!
You just have to use an example (row) with Attributes(Column Headers) and then feed it to the Apply Model operator. Make sure you remove the label(or what you want to be predicted) from that example.
The results will give you a row with an added attribute called Prediction.

Related

DeepLearning4J - Acquiring Data and Train Model

I try to create the easiest of a NeuralNetwork and training it with some data:
Therefore I created a test.csv with a the following pattern:
number,number+1;
number2,number2+1
...
I try to make a linear regression with the network...
But I do not find a way to acquire the data, DataSetIterator does not work.
How to fit the Data, how to test the Data?
In our examples, we encourage people to use datavec + recordreaderdatasetiterator.
Datavec has all of the various data loading components.
I'm not sure what you mean about "datasetiterator not working" wihtout seeing any code, but it seems like you didn't really look at our examples.
In there are multiple examples of a csv record reader you can use for both regression and classification use cases.
Consider reorienting your data pipeline to use those.
Those examples are always found here:
https://github.com/deeplearning4j/dl4j-examples
If you follow any of those, the same pattern emerges:
Record reader for whatever data format -> RecordReaderDataSetIterator
The iterator allows you to specify common constructors such as whether it is a regression or not, which column your label is etc.

How to use Spark MlLib/Pipelines to build 1 model per each user [duplicate]

This question already has an answer here:
Run ML algorithm inside map function in Spark
(1 answer)
Closed 4 years ago.
I want to train different models for each user in my dataset. Is there built in support for that in Spark MlLib/Pipelines?
If not, what's the easiest/cleanest way to train multiple and separate models for each user?
Unfortunately Spark-ML doesn't provide the ability to separate concept "single model - single user". But you can make a custom logic as you wish. I see two possible variants of solving this task.
The first scenario for solving this situation is following to the next algorithm (I took everything for example - you will have different steps, but algorithm will logically similar):
You must obtain training data for the specific user - (e.g. read data csv file from hdfs, s3 etc.)
Train model for the Dataset which depends on the user related data - let's consider the next situation your dataset has two columns - the specific criteria X and user's productivity Y and latest parameter is changeable for user group - you must train your model for instance with LinearRegression so predict if user can do work in the time or can't.
Next, you save data to the disk on call trained model depending on the
user's id, group or etc.
The second approach is to train your model so it was applicable to every user, you must choose options for algorithm so it didn't depend on group of user, in other words, generalize algorithm of training model to all user groups - in this case, you don't have a sense of separation
"single-model--> single user". If the second variant is more complicated to the implementation on your dataset, follow the first approach.

usage of naive bayes Model for prediction

Hi all I am new to scala and spark MLIB.
I have a dataset of diseses of diseases along with the symptoms which are in the following format:
Disease,symptom1 symptom2 symptom3
I have almost 300 entries which are in the above mentioned format in a CSV file.
I want to achieve this following functionality:
If a user has given a input of sysmptoms namely Symptom1,Symptom2,Symptom3 the model must be able to predict the disease.
I have the following Questions:
which machine learning model should I use to achieve this functionality.
I have gone through some models and founf NAIVES Bayes model if wrong correct me.
can I provide text input to Naives Bayes model.
Is there any sample code available to achieve this functionality.
You can use any of the classification algorithms present in Spark MLlib for further reference read the official docs and go thru this link from databricks blog https://databricks.com/blog/2015/07/29/new-features-in-machine-learning-pipelines-in-spark-1-4.html

data mining project Dilemma

I research a set of data, consisting of two data files:
The first contains user id id artists and ranking of users for artists that want to rank.
The second data file contains id and name artists
I have chosen research question which is:
Is the artist is Popular or not?
In other words,by given the new singer, who will not found in the data file, using algorithms, we will classify it as an artist and to know if it is a popular or not.
For Prediction step I chose to use logistic regression method
But my problem is earlier.
I do not know how, technically, to determine who from the existing data will be defined as successful as an artist who is unsuccessful.
I thought of some methods, for example:k-means with k=2 (but in this method i have a problem with function disance),knn with k=2 etc.
I need guidance ,refers to how i will make to clustering to the Existing data
and general tips to the project.
thank you.

Continue training a Doc2Vec model

Gensim's official tutorial explicitly states that it is possible to continue training a (loaded) model. I'm aware that according to the documentation it is not possible to continue training a model that was loaded from the word2vec format. But even when one generates a model from scratch and then tries to call the train method, it is not possible to access the newly created labels for the LabeledSentence instances supplied to train.
>>> sentences = [LabeledSentence(['first', 'sentence'], ['SENT_0']), LabeledSentence(['second', 'sentence'], ['SENT_1'])]
>>> model = Doc2Vec(sentences, min_count=1)
>>> print(model.vocab.keys())
dict_keys(['SENT_0', 'SENT_1', 'sentence', 'first', 'second'])
>>> sentence = LabeledSentence(['third', 'sentence'], ['SENT_2'])
>>> model.train([sentence])
>>> print(model.vocab.keys())
# At this point I would expect the key 'SENT_2' to be present in the vocabulary, but it isn't
dict_keys(['SENT_0', 'SENT_1', 'sentence', 'first', 'second'])
Is it at all possible to continue the training of a Doc2Vec model in Gensim with new sentences? If so, how can this be achieved?
My understand is that this is not possible for any new labels. We can only continue training when the new data has the same labels as the old data. As a result, we are training or retuning the weights of the already learned vocabulary, but are not able to learn a new vocabulary.
There is a similar question for adding new labels/words/sentences during training: https://groups.google.com/forum/#!searchin/word2vec-toolkit/online$20word2vec/word2vec-toolkit/L9zoczopPUQ/_Zmy57TzxUQJ
Also, you might want to keep an eye on this discussion:
https://groups.google.com/forum/#!topic/gensim/UZDkfKwe9VI
Update: If you want to add new words to an already trained model, take a look at online word2vec here:
http://rutumulkar.com/blog/2015/word2vec/
According to gensim documentation online/incremental training is not supported for doc2vec.
refer to https://github.com/RaRe-Technologies/gensim/issues/1019
I could still add new documents to an existing doc2vec model( but some it crashes due to segmentation fault) but most similar query does not work on newly added document(so this approach seems useless).