distilbert model is not working at ktrain - distilbert

I tried to use distilbert classifier. but I am getting the following error.
This is my code
(X_train,y_train),(X_test,y_test),prepro
=text.texts_from_df(train_df=data_train,text_column="Cleaned",label_columns=col
,val_df=data_test,maxlen=500,preprocess_mode="distilbert")
and here is the error
OSError: Model name 'distilbert-base-uncased' was not found in tokenizers model name list (distilbert-base-uncased, distilbert-base-uncased-distilled-squad, distilbert-base-cased, distilbert-base-cased-distilled-squad, distilbert-base-german-cased, distilbert-base-multilingual-cased). We assumed 'distilbert-base-uncased' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url._
Due to my office current environmental issue, I can only work on tf 2.2 and python 3.8. Right now I am using 0.19.
Do you think it will affect my current environment if I downgrade it to 0.16?

This error may happen if there is a network or firewall issue preventing download of the tokenizer files. See this FAQ entry for remedies.
Also, when you use preprocess_mode='distilbert', texts_from* functions return TransformerDataset instances, not arrays. You'll need to replace (X_train, y_train) with train_data, for example. See this example notebook.

Related

Dymola error message when translating: "unknown internal error in Dymola"

I have problems translating Modelica models with Dymola 2020: When I try to translate the models, the following error message appears:
"unknown internal error in Dymola".
The model was translating and simulating a couple of days ago, and the same model still runs on the computer of other colleagues. I didn't change the compiler in between nor the Dymola version. I've also restarted the computer but the problem persists.
Also, other models are still translating, so not all models are affected by this error.
Does anyone have a clue how to debug this error? Thank you very much for all hints!
The most likely explanation would be some weird setting of some flag.
You can see if you have any odd settings of normal flags by:
Dymola 2020: Edit>Options>General>Flags... Check "Non-default"
Dymola 2020x: Tools>Options>General>Flags... Check "Non-default"
(If it is a non-Boolean setting it is a bit messier.)
That is assuming it is really the same model and there is no difference in any model in the path (including working directory).
Frankly speaking, if you get "unknown internal error in Dymola" you should report it to technical support at Dassault Systèmes (through your reseller), and let them debug it.
It is not your job to debug such errors.
Have you tried to delete the content from the working directory (WD)?
Sometimes there are artifacts, which mess up the compilation of a specific model.
You can check where it is, using the
GUI, File -> Working Directory -> Copy Path and paste it in the Explorer
Command line typing cd, which returns the path to the WD
Then make sure that there are no important files in the WD (usually .mo files) and finally delete the full content of the directory.
Note: You should ensure that the WD is a local path (otherwise performance can take a serious hit). Besides that it is usually a good idea to have the WD separated form the directory where models are stored.

Visualize an embedded neo4j instance in a web browser using default visualization

I am using embedded Neo4j, version 3.0.3. Following this guide, I have created Neo4j/Java code. It creates a database, adds two nodes (one for java, one for scala) and adds a relationship.
package examples;
import java.io.File;
import org.neo4j.graphdb.*;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;
public class HelloWorld {
public static void main(String[] args) {
GraphDatabaseFactory dbFactory = new GraphDatabaseFactory();
GraphDatabaseService db = dbFactory.newEmbeddedDatabase(new File("Test_DB"));
try (Transaction tx = db.beginTx()) {
Node javaNode = db.createNode(Tutorials.JAVA);
javaNode.setProperty("TutorialID", "JAVA001");
Node scalaNode = db.createNode(Tutorials.SCALA);
scalaNode.setProperty("TutorialID", "SCALA001");
Relationship relationship = javaNode.createRelationshipTo(scalaNode, TutorialRelationships.JVM_LANGUAGES);
relationship.setProperty("Id", "1234");
tx.success();
}
}
}
enum Tutorials implements Label {
JAVA, SCALA, SQL, NEO4J;
}
enum TutorialRelationships implements RelationshipType {
JVM_LANGUAGES, NON_JVM_LANGUAGES;
}
I program using Eclipse, so all the libraries are imported and I can just click the 'run' button on Eclipse to get the code running, and it seems to work without any issues. Upon running the code, I now have a folder Test_DB in the ~/workspace/project_name/Test_DB directory, where project_name is the name of the overall Eclipse folder. My goal is now to visualize this database in a web browser. The guide I linked to earlier shows an example of this; the user was able to look at the nodes in the web browser (see the bottom of the webpage). Unfortunately, I am using a Linux computer with Firefox, and that tutorial was in Windows, and I can't figure out how to get the visualization.
There have been a few other questions related to this. Unfortunately, some of them (such as this one) propose using software other than the default visualization. I don't own the computer and I have to go through a roundabout process to get external code installed. To be clear what I mean, this link discusses the default Neo4j browser. This is what I would like to see.
This question here directly tackles the same issue, and in fact, it uses the exact same tutorial I used! The answer proposes changing the path in the neo4j-server.properties file. Unfortunately, that file doesn't exist, and upon further analysis, it seems like Neo4j 3.0 changed the configuration naming, which I found out by reading the answer to this similar question. There is now a file conf/neo4j.conf with this information. I entered the following information in the first few lines, keeping the other settings the default:
# The name of the database to mount
dbms.active_database=Test_DB
# Paths of directories in the installation.
dbms.directories.data=/home/username/workspace/project_name/
This does not appear to work. Am I using these settings correctly? When I open the neo4j web browser after running ./bin/neo4j start and click on the database symbol in the left hand side, I see "Name: Test_DB", but it also says there are no nodes and no relationships in the database, and returning a match all query provides nothing. Is it possible for the browser to connect to my database so it can see the nodes (e.g., the two nodes in my Java code above)?
Or is it that I'm not using this code correctly; does the code somehow have to avoid quitting (i.e., replace tx.success() with something else?) to keep the data there?
Sorry about answering my own question, but I finally figured out how to do this! Here's what happens: according to the github change log for 3.0.0.RC-1:
Databases are now stored in a directory called databases under the directory specified in dbms.directories.data
So what we actually have to do is make sure our data base is in the following location:
/home/username/workspace/project_name/databases/
The issue is that when we run it in Eclipse, we get the database in the following folder:
/home/username/workspace/project_name/
Thus, the solution is to make sure the new database folder is preceded by a databases name, i.e., I would change one line to:
GraphDatabaseService db = dbFactory.newEmbeddedDatabase(new File("databases/Test_DB"));

Kettle getStepMetaInterface() function error in Modified Script with MongoDB Output step

Using pentaho 5.3, Modified Java Script Value step to inject those data.
I want to dynamically set path and names in the MongoDB Output step. Here is my code
var meta = new org.pentaho.di.trans.TransMeta( source_path );
var mongoStep = meta.findStep("MongoDB Output");
mongoStep.setDescription('This is MongoDB Output by Ray');
Alert(mongoStep.getName()); // code is ok until here.
var mongoStepMeta = mongoStep.getStepMetaInterface() // error occurs here
When I want to get the getStepMetaInterface() to use step functions, the error occurs.
java.lang.LinkageError: loader constraint violation: loader (instance of org/pentaho/di/core/plugins/KettleURLClassLoader) previously initiated loading for a different type with name "org/pentaho/metastore/api/IMetaStore"
This error seems to be generated by the violation of .jar.
But when I use those original steps, like "Microsoft Access Input", I can successfully get getStepMetaInterface(). In this way, I can use all the functions defined in AccessInputMeta.java.
From my point of view, this problem may have a relation to the MongoDB Output step, because this is a plugin for kettle, but I am not sure.
Any response is appreciated!!
Yep, it's because of the plugin system. The Modified Java Script step and the Access Input step (among many others) are all in the Kettle engine and share a classloader. External plugins (like MongoDB Input/Output) have their own isolated classloaders. You have to do some voodoo with thread context classloaders and reflection in order to get at the Meta classes for those steps.
Here's an example of the hoops you have to jump through to get at external plugins (the Big Data plugin in this example) from the Modified Java Script step:
https://github.com/brosander/pentaho-hadoop-shims/blob/verification-2/test/verification/jobs/dependencies/ForceHiveToConnectRemotely.ktr
First, thanks to Mass who solves the problem.
Here is his advice:
This problem is caused by jar file conflict.
In pentaho installation there are two jar files:
-lib/metastore-5.3.0.0-213.jar
-plugins/pentaho-mongodb-plugin/lib/metastore-5.3.0.0-213.jar
Just remove the last jar file, this error will disappear.

Data.c file generation

i'm very new to matlab, i'm working on a software which needs the following files as input model.c,model.h,model_data.c for a particular simulink model. I have a model for which i can't generate model_data file using RTW, i have tried to get some information on the files generated by RTW, but i didnt get sufficient info. If there anybody who knows about the RTW please let me know the blocks which are required to generate model_data.c
thank you
model_data.c is a conditionally created file (i.e. it is only created if it is needed, which depends on the way the model is set up for code generation).
For a discussion of the Simulink Coder build process, and what files get generated when, search the doc for the section titled "Files and Folders Created by Build Process".
For others who need help in future.
Open the Configuration Parameters pane. Go to Code Generation -> Optimization and make sure that Default parameter behavior is set to Tunable.

Weka EM cluster get "Error: Could not find or load main class test" in eclipse

I want to use weka to cluster tweets in the database in JSP. In GUI, I find only HierarchiccalClusterer and Filteredcluster available for string clustering. Then I find this clusteringdemo sample code from weka official website: https://svn.scms.waikato.ac.nz/svn/weka/trunk/wekaexamples/src/main/java/wekaexamples/clusterers/ClusteringDemo.java
However, after set up the sample arff code in weka directory, I get this error "Error: Could not find or load main class ClusteringDemo".
Can anyone help me to find out the reason?
I only change filename in the sentence data = DataSource.read(filename);. Besides, my classpath set up correctly for I already done some classifier.
1.- Maybe the ClusteringDemo.class is not in your classpath.
You should add the class of jar file to your project.
2.- Anyway, you can download the java code from: http://weka.wikispaces.com/file/detail/ClusteringDemo.java
Compile and run it (make sure that weka.jar is in your classpath).
3.- If you have added ClusteringDemo.java to your project. Make sure that it has the "package" line (the first line) according to its location. Otherwise Java will not be able to find it.
Good luck using EM, maybe you can also try N-grams + Naive Bayes.