How to recommend new items in a neural network collaborative filtering recommendation system with a SoftMax output - neural-network

If each softmax output neuron represents the probability of the user liking that item, how would you use this model to recommend new posts, as the output neurons do not change? Should I use a totally different system if I am trying to recommend new items? What are the ways I could structure my model so that it can predict new items?

Related

Is it possible to simultaneously use and train a neural network?

Is it possible to use Tensorflow or some similar library to make a model that you can efficiently train and use at the same time.
An example/use case for this would be a chat bot that you give feedback to. Somewhat like how pets learn (i.e. replicating what they just did for a reward). Or being able to add new entries or new responses they can use.
I think what you are asking is whether a model can be trained continuously without having to retrain it from scratch each time new labelled data comes in.
Answer to that is - Online models
There are models that can be trained continuously on data without worrying about training them from scratch. As per Wikipedia definition
Online machine learning is a method of machine learning in which data becomes available in sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once.
Some examples of such algorithms are
BernoulliNB
GaussianNB
MiniBatchKMeans
MultinomialNB
PassiveAggressiveClassifier
PassiveAggressiveRegressor
Perceptron
SGDClassifier
SGDRegressor
DNNs

Can I train Word2vec using a Stacked Autoencoder with non-linearities?

Every time I read about Word2vec, the embedding is obtained with a very simple Autoencoder: just one hidden layer, linear activation for the initial layer, and softmax for the output layer.
My question is: why can't I train some Word2vec model using a stacked Autoencoder, with several hidden layers with fancier activation functions? (The softmax at the output would be kept, of course.)
I never found any explanation about this, therefore any hint is welcome.
Word vectors are noting but hidden states of a neural network trying to get good at something.
To answer your question
Of course you can.
If you are going to do it why not use fancier networks/encoders as well like BiLSTM or Transformers.
This is what people who created things like ElMo and BERT did(though their networks were a lot fancier).

How to jointly learn two tasks at prediction level?

I have trained a network on two different modals of the same image. I pass the data together in one layer but after that, it is pretty much two networks in parallel, they don't share a layer and the two tasks have different set of labels, therefore I have two different loss and accuracy layers (I use caffe btw). I would like to learn these tasks jointly. For example, the prediction of a class of task 1 should be higher in presence of the task 2 predicting a certain class label. I don't want to join them at feature level but at prediction level. How do I get to do this?
Why don't you want to join the prediction at feature level?
If you really want to stick to your idea of not joining any layers of the network, you can apply a CRF or SVM on top of the overall prediction pipeline to learn cross-correlations between the predictions. For any other method you will need to combine features inside the network, one way or another. However I would strongly recommend, that you consider doing this. It is a general theme in deep learning, that doing stuff inside the network works better then doing it outside.
From what I have learned by experimenting with joint prediction, you will get the most performance gain, if you share weights between all convolutional layers of the network. You can then apply independent fc-layers, followed by a softmax regression and separate loss functions on top of the jointly predicted features. This will allow the network to learn cross-correlation between features while it is still able to make separate predictions.
Have a look at my MultiNet paper as a good starting point. All our training code is on github.

Use a trained neural network to imitate its training data

I'm in the overtures of designing a prose imitation system. It will read a bunch of prose, then mimic it. It's mostly for fun so the mimicking prose doesn't need to make too much sense, but I'd like to make it as good as I can, with a minimal amount of effort.
My first idea is to use my example prose to train a classifying feed-forward neural network, which classifies its input as either part of the training data or not part. Then I'd like to somehow invert the neural network, finding new random inputs that also get classified by the trained network as being part of the training data. The obvious and stupid way of doing this is to randomly generate word lists and only output the ones that get classified above a certain threshold, but I think there is a better way, using the network itself to limit the search to certain regions of the input space. For example, maybe you could start with a random vector and do gradient descent optimisation to find a local maximum around the random starting point. Is there a word for this kind of imitation process? What are some of the known methods?
How about Generative Adversarial Networks (GAN, Goodfellow 2014) and their more advanced siblings like Deep Convolutional Generative Adversarial Networks? There are plenty of proper research articles out there, and also more gentle introductions like this one on DCGAN and this on GAN. To quote the latter:
GANs are an interesting idea that were first introduced in 2014 by a
group of researchers at the University of Montreal lead by Ian
Goodfellow (now at OpenAI). The main idea behind a GAN is to have two
competing neural network models. One takes noise as input and
generates samples (and so is called the generator). The other model
(called the discriminator) receives samples from both the generator
and the training data, and has to be able to distinguish between the
two sources. These two networks play a continuous game, where the
generator is learning to produce more and more realistic samples, and
the discriminator is learning to get better and better at
distinguishing generated data from real data. These two networks are
trained simultaneously, and the hope is that the competition will
drive the generated samples to be indistinguishable from real data.
(DC)GAN should fit your task quite well.

Neural Network : extrapolate between two trainings

Is it possible to train a Network with 2 inputs : one is the data and the other is a constant that we define.
We train the network with one set of datas and set the second input to '10' for example
then once it has converged, we train with another set of data and set the second input to '20' this time.
Now what if i input test data with the second parameter set to '15', will it automatically extrapolate between the two learned states?
If not, how do i do if i want to do what i explained above : extrapolate between two training states?
thanks a lot
Jeff
It is possible to add another input as a parameter into the neural network, but I am unsure what benefit you are trying to achieve by adding this input.
You would need to train the neural network with this input so that it would estimate the value between the two trained networks. This would involve training each individual network first and then training a second network that would extrapolate between states.
If you are trying to modularise each trained Neural Network for specific roles or classifications, and these classifications represent some kind of continuous relationship (for example, weather predictions that specialise for no rain, light rain, moderate rain and heavy rain), then perhaps this input could be used in some way to encourage the output of a particular network.
If you would like to adjust the weights of each network so that some Neural Networks have more preference than others, perhaps an Ensemble approach can assist with different weights for each network (static and dynamic options are available). If you just want to map the differences between the two networks with different weights, perhaps a linear or non-linear function can be applied between the two networks and mapped to detail the changes between the two trained networks.