Train mobileBERT from scratch for other languages - classification

I am thinking of training a mobileBERT model from scratch for the German language.
Can I use the English mobileBERT model from HuggingFace to apply it to a dataset in another language?
It makes sense that I would have to adapt the teacher model of mobileBERT to a BERT model of the corresponding language. Unfortunately, I could not find a parameter to adapt the teacher model.
Are there any other ideas on how best to train a mobileBERT model for another language?
Many greetings and many thanks!
ALPacker

Related

How to create a global forecasting model using deep learning?

I am aiming to build a global/general forecasting model (don't know what's the proper terminology) using deep learning. The idea behind this is to create a model trained on several time series that will allow me to obtain forecasts for the time series used during training and for others not used during training.
I think this is a mixed problem, classification-regression. If I am correct and this is a mixed problem then: How should I do it?
Should I work only with the variable/s I aim to forecast (using lagged observations). I mean passing through the model in question (I think it could be a CNN-LSTM or a LSTMGC model) only the objective variable/s. If this is an option I would like to be explained or pointed to an explanation of how this model works and how should the data for the training be structured.
Or should I pass to the model the categorical variables as well. Being in the case of "product sales" variables like region, type of product the categorical ones and sales or amount of selled items the variables to forecast. Regarding to this I have the same doubt as before. (In addition think this could be a model easier to interpret)
Will be of a great help if anyone could point to or explain to me if there is any methodology on how to solve this kind of problems using deep learning. Aspects like what are the most typical ANN structures to solve this kind of problem, how should the data be prepared and how should the model be trained.
Thanks in advance for all the help.
Original question posted in Artificial Intelligence Stack.

Can BERT be used to train non-text sequence data for classification?

Can BERT be used for non-text sequence data? I want to try BERT for sequence classification problems. The data is not text. In other words, I want to train BERT from scratch. How do I do that?
The Transformer architecture can be used for anything as long as it is a sequence of discrete symbols. BERT is trained using the marked language model objective, i.e., it is trained to fill in a gap in a sequence based on the rest of the sequence. If your data is of that kind, you can train a BERT-like model on it. With sequences of continuous vectors, you would need to come up with a suitable alternative to masked language modeling.
You can follow any of the many tutorials that you can find online, e.g., from the Huggingface blog or towardsdatascience.com.

A question on using ontologies for text classification

I want to classify short pieces of text (neuroscience-related), using an ontology (NIF). I have read some papers, but non of those went through the whole procedure of performing an ontology-based classification. Therefore, I wanted to double check and see if I got it right:
Feature Extraction: First, we will use the ontology for annotating (tagging) the text with ontology concepts. We will parse the text and annotate terms that can be found from the ontology using the class of those terms in the ontology.
I guess the accuracy of the annotation process can be increased by using techniques such as finding semantic similarities between ontology terms and terms in the text. Also, by using techniques such as lemmatization.
Using ontology and a rule-based classification system, there is no need for learning and we can move towards classification.
Then, for the classification phase and since we're using a rule-based classifier, we will classify the text according to the classes assigned to the text. Is this correct? Also, can we move up in the ontology and use super-classes in annotation in order to reduce the number of tags (classes) used in the classification?
My other question is: Are there enough benefits for using ontologies in classification? Because I read in a paper that using ontologies actually decreases the accuracy! What are those benefits? What does using meaningful tags from the ontology allow us to do that arbitrary terms don't?

Genetic Programming in Agent Based Modeling with NetLogo

I have an Agent-Based model written in NetLogo. Now I want to take it to next level and evolve my agents as Genetic Programming population. I want a way to incorporate the genetic programming part into my NetLogo model, either through an interface or write it in NetLogo itself if that's possible. Anybody has any insights into this?
Thank you

Cross Entropy for Language modelling

im currently working on a classification task using language modelling. The first part of the project involved using n-gram language models to classify documents using c5.0. The final part of the project requires me to use cross entropy to model each class and classify test cases against these models.
Does anyone have have experience in using cross entropy, or links to information about how to use a cross entropy model for sampling data? Any information at all would be great! Thanks
You can get theoretic background on using cross-entropy with language models on various textbooks, e.g. "Speech and language processing" by Jurafsky & Martin, pages 116-118 in the 2nd edition.
As to concrete usage, in most language modeling tools the cross-entropy is not directly measured, but the 'Perplexity', which is the exp of the cross-entropy. The perplexity, in turn, can be used to classify documents. see, e.g. the documentation for the command 'evallm' in SLM, the Carnegie-Melon university language modeling tools (http://www.speech.cs.cmu.edu/SLM/toolkit_documentation.html)
good luck :)