I am trying to add noise to the telco dataset in order to compare prediction accuracy levels using neural networks.
I have converted most of the dataset to binary variables(0 or 1). Want to add noise to the entire dataset before training neural network on them. Would the only option be to flip them? Or is there any other way?
this is the dataset
https://www.kaggle.com/datasets/blastchar/telco-customer-churn
Related
I am using PySpark to implement a Churn classification model for a business problem and the dataset I have is imbalanced. So when I train the model, I randomly select a dataset with equal numbers of 1's and 0's.
Then I applied the model in a real-time data and the number of predicted 1's and 0's were obviously equal.
Now, I need to calibrate my trained model. But I couldn't find a way to do it in PySpark. Does anyone have an idea how to calibrate a model in PySpark, May be something like CalibratedClassifierCV ?
I have a large features dataset of around 111 Mb for classification with 217000 data points and each point has 1760000 features point. When used in training with SVM in MATLAB, it takes a lot of time.
How can be this data processed in MATLAB.
It depends on what sort of SVM you are building.
As a rule of thumb, with such big feature sets you need to look at linear classifiers, such as an SVM with no/the linear kernel, or logistic regression with various regularizations etc.
If you're training an SVM with a Gaussian kernel, the training algorithm has O(max(n,d) min (n,d)^2) complexity, where n is the number of examples and d the number of features. In your case it ends up being O(dn^2) which is quite big.
I would like to know what max pooling and mean pooling are for recurrent neural networks like LSTM while using them for sentiment analysis.
I think as far as I know we pooling is mostly used in convolution neural networks.
and it is a method of concentration of higher order matrix to lower order matrix which contains properties of inherent matrix...in pooling a matrix smaller size and is moved over the original matrix and max value or average value in smaller matrix is selected to form a new resultant matrix of further computation. link-https://machinelearningmastery.com/pooling-layers-for-convolutional-neural-networks/
I am playing with TensorFlow to understand convolutional autoencoders. I have implemented a simple single-layer autoencoder which does this:
Input (Dimension: 95x95x1) ---> Encoding (convolution with 32 5x5 filters) ---> Latent representation (Dimension: 95x95x1x32) ---> Decoding (using tied weights) ---> Reconstructed input (Dimension: 95x95x1)
The inputs are black-and-white edge images i.e. the results of edge detection on RGB images.
I initialised the filters randomly and then trained the model to minimise loss, where loss is defined as the mean-squared-error of the input and the reconstructed input.
loss = 0.5*(tf.reduce_mean(tf.square(tf.sub(x,x_reconstructed))))
After training with 1000 steps, my loss converges and the network is able to reconstruct the images well. However, when I visualise the learned filters, they do not look very different from the randomly-initialised filters! But the values of the filters change from training step to training step.
Example of learned filters
I would have expected at least horizontal and vertical edge filters. Or if my network was learning "identity filters" I would have expected the filters to all be white or something?
Does anyone have any idea about this? Or are there any suggestions as to what I can do to analyse what is happening? Should I include pooling and depooling layers before decoding?
Thank you!
P/S: I tried the same model on RGB images and again the filters look random (like random blotches of colours).
I am making 8 x 8 tiles of Images and I want to train a RBF Neural Network in Matlab using those tiles as inputs. I understand that I can convert the matrix into a vector and use it. But is there a way to train them as matrices? (to preserve the locality) Or is there any other technique to solve this problem?
There is no way to use a matrix as an input to such a neural network, but anyway this won't change anything:
Assume you have any neural network with an image as input, one hidden layer, and the output layer. There will be one weight from every input pixel to every hidden unit. All weights are initialized randomly and then trained using backpropagation. The development of these weights does not depend on any local information - it only depends on the gradient of the output error with respect to the weight. Having a matrix input will therefore make no difference to having a vector input.
For example, you could make a vector out of the image, shuffle that vector in any way (as long as you do it the same way for all images) and the result would be (more or less, due to the random initialization) the same.
The way to handle local structures in the input data is using convolutional neural networks (CNN).