Can BERT be used to train non-text sequence data for classification? - classification

Can BERT be used for non-text sequence data? I want to try BERT for sequence classification problems. The data is not text. In other words, I want to train BERT from scratch. How do I do that?

The Transformer architecture can be used for anything as long as it is a sequence of discrete symbols. BERT is trained using the marked language model objective, i.e., it is trained to fill in a gap in a sequence based on the rest of the sequence. If your data is of that kind, you can train a BERT-like model on it. With sequences of continuous vectors, you would need to come up with a suitable alternative to masked language modeling.
You can follow any of the many tutorials that you can find online, e.g., from the Huggingface blog or towardsdatascience.com.

Related

Classifying Algorithm for time series data

I have time series data(one instance of 30 seconds) as shown in the figure, I would like to know the what kind of classifying algorithms I could use.
This is how the data looks in time and frequency domain
In the image we have 2 classes(one represented in blue and the other in orange).On the left section of the image we have data represented in the time-domain and on the right its equivalent Fourier-Transform.
I am thinking of using LSTM to train the data for both domains and also converting the above representations into an image and use CNNs to train.
Any Suggestion such as a better algorithm or a better representation of data would help.
One architecture suited to your needs is WaveNet.
The WaveNet architecture is constructed to deal with very long sequences (your sequences are reasonably long) and has been shown to outperform LSTM based RNNs on several tasks in the original paper.
I am not sure what you mean by
converting the above representations into an image and use CNNs to train
so I would suggest sticking to recurrent models or WaveNet for sequence classification.

Implementing Hierarchical Attention for Classification

I am trying to implement the Hierarchical Attention paper for text classification. One of the challenges that I am finding is how to manage batching and updates to the weights of the network by the optimizer. The architecture of the network is made of two encoders stacked one after the other: a sentence encoder, and a document encoder.
When the dataset is made of large documents, the following problem arises: for each pass through the document encoder, you will have multiple passes through the sentence encoder. When the loss is calculated and the optimizer uses the calculated gradients to update the weights of the parameters of the network, I am assuming that the weights of the sentence encoder should be updated differently to the weights of the document encoder. What is a good strategy to do so? How could that strategy could be implemented in libraries such as Keras or Pytorch?

Use a trained neural network to imitate its training data

I'm in the overtures of designing a prose imitation system. It will read a bunch of prose, then mimic it. It's mostly for fun so the mimicking prose doesn't need to make too much sense, but I'd like to make it as good as I can, with a minimal amount of effort.
My first idea is to use my example prose to train a classifying feed-forward neural network, which classifies its input as either part of the training data or not part. Then I'd like to somehow invert the neural network, finding new random inputs that also get classified by the trained network as being part of the training data. The obvious and stupid way of doing this is to randomly generate word lists and only output the ones that get classified above a certain threshold, but I think there is a better way, using the network itself to limit the search to certain regions of the input space. For example, maybe you could start with a random vector and do gradient descent optimisation to find a local maximum around the random starting point. Is there a word for this kind of imitation process? What are some of the known methods?
How about Generative Adversarial Networks (GAN, Goodfellow 2014) and their more advanced siblings like Deep Convolutional Generative Adversarial Networks? There are plenty of proper research articles out there, and also more gentle introductions like this one on DCGAN and this on GAN. To quote the latter:
GANs are an interesting idea that were first introduced in 2014 by a
group of researchers at the University of Montreal lead by Ian
Goodfellow (now at OpenAI). The main idea behind a GAN is to have two
competing neural network models. One takes noise as input and
generates samples (and so is called the generator). The other model
(called the discriminator) receives samples from both the generator
and the training data, and has to be able to distinguish between the
two sources. These two networks play a continuous game, where the
generator is learning to produce more and more realistic samples, and
the discriminator is learning to get better and better at
distinguishing generated data from real data. These two networks are
trained simultaneously, and the hope is that the competition will
drive the generated samples to be indistinguishable from real data.
(DC)GAN should fit your task quite well.

Input values of an ANN constructed with keras framework (using theano)

I want to costruct a neural network which will be trained based on data i create. My question is what form these data should have? In other words does keras allow neural networks that take strings/characters as input? If not, and only is able to accept numbers in what range should the input/output be?
The only condition for your input data i.e features, is that it should be numerical. There isn't really any constraint on range but it's always a good idea to do Feature Scaling, Normalization etc to make sure that our model won't get confused. Neural Networks or other machine learning methods cannot accept string (characters, words) directly, therefore, you need to first convert string to numbers. There are many ways to do that, most common techniques include Bag of Words, tf-idf features, word embeddings etc.
Following tutorials (using scikit) might be a good starting point:
http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
https://www.kaggle.com/c/word2vec-nlp-tutorial/details/part-1-for-beginners-bag-of-words

How to use word2vec and RNN together?

So, I am using word2vec in Java, and trying to train it somehow so that it gives me vector representation for words and sentences.
Can I use this for feeding input into a neural network, to get a response on the basis of the word2vec data? I am planning to make a chatbot with the help of this.
Adding on to #galloguille's comments, you can use pre-trained word2vec's word-vectors to initialize your RNN. RNN can learn from sequence of words to predict next word(s). A good example with code for this, you can find here - https://github.com/larspars/word-rnn.
There is good collection of current state of the art on chatbots here - https://stanfy.com/blog/the-rise-of-chat-bots-useful-links-articles-libraries-and-platforms/
From my understanding, most effective chatbots don't use RNN directly (at present) to reply to a question, but try to predict intent (from a fixed set of intents) of the question in the first step. Based on each intent, they calculate some actionable insights and a logical reply to the question.