How to train natural language classifier using Fluent - ibm-cloud

I'm using Fluent library to fire a request to Natural Language Classifier service so as to 'train' the data.
Documentation says following parameters to be passed:
name=training_data; type=file; description=training data
name=training_meta_data; type=file; description=meta data to identify language etc
Below is my code sample:
File trainingCSVFile = new File("path to training file");
Request request=Request.Post(<bluemix service url>).
bodyFile(trainingCSVFile, ContentType.TEXT_PLAIN).
bodyString("{\"language\":\"en\",\"name\":\"PaymentDataClassifier\"}", ContentType.APPLICATION_JSON);
How ever am getting internal server error which plausibly due to my request format. Can any one help me how to pass the above mentioned parameters using Fluent library on priority?

I'm going to assume that you are using Java and suggest you to use the Java SDK. You can find examples to use not only Natural language Classifier but all the Watson services + Alchemy services.
Installation
Download the jar
or use Maven
<dependency>
<groupId>com.ibm.watson.developer_cloud</groupId>
<artifactId>java-sdk</artifactId>
<version>2.10.0</version>
</dependency>
or use Gradle
'com.ibm.watson.developer_cloud:java-sdk:2.10.0'
The code snippet to create a classifier is:
NaturalLanguageClassifier service = new NaturalLanguageClassifier();
service.setUsernameAndPassword("<username>", "<password>");
File trainingData = new File("/path/to/csv/file.csv");
Classifier classifier = service.createClassifier("PaymentDataClassifier", "en", trainingData);
System.out.println(classifier);
The training duration will depend on your data but once it's trained you can do:
Classification classification = service.classify(classifier.getId(), "Is it sunny?");
System.out.println(classification);
Feel free to open an issue in the GitHub repo if you have problems

Related

libraryDependencies for `TFNerDLGraphBuilder()` for Spark with Scala

Can anyone tell what is libraryDependencies for TFNerDLGraphBuilder() for Spark with Scala? It gives me error, Cannot resolve symbol TFNerDLGraphBuilder
I see it works for notebook as given below
https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Public/4.NERDL_Training.ipynb
TensorFlow graphs in Spark NLP are built using TF python api. As far as I know, the java version for creating the Conv1D/BiLSTM/CRC graph is not included.
So, you need to create it first following the instructions in:
https://nlp.johnsnowlabs.com/docs/en/training#tensorflow-graphs
That will create a pb TensorFlow file that you have to include in the NerDLApproach annotator. For example:
val nerTagger = new NerDLApproach()
.setInputCols("sentence", "token", "embeddings")
.setOutputCol("ner")
.setLabelColumn("label")
.setMaxEpochs(100)
.setRandomSeed(0)
.setPo(0.03f)
.setLr(0.2f)
.setDropout(0.5f)
.setBatchSize(100)
.setVerbose(Verbose.Epochs)
.setGraphFolder(TfGrpahPath)
Note that you have to include the embedding annotation first and that the training process will be executed in the driver. It is not distributed as it could be with BigDL.

How to save models in pytorch from tpu to cpu

I am training a Neural Network model with pytorch.
Since this model is very complicated, I made use of the pytorch_xla package to use TPU. I finished training the model and now I want to save the weight so I will be able to use them from any envrionment.
I tried to save the data like so
file_name = "model_params"
torch.save(model.state_dict(), file_name)
and when I tried to load them (from environment which does not support TPU)
model.load_state_dict(torch.load(file name))
I got the following error
NotImplementedError: Could not run 'aten::empty_strided' with arguments from the 'XLA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty_strided' is only available for these backends: [CPU, Meta, BackendSelect, Named, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
Is there a way to do what I want?

Visualize allotrope ontology suite

I recently found this ontology suite:
https://bioportal.bioontology.org/ontologies/AFO/?p=summary
I want to explore its contents. Normally I use http://www.visualdataweb.de/webvowl/ for such purposes. Example (Basic Formal Ontology):
http://www.visualdataweb.de/webvowl/#iri=http://purl.obolibrary.org/obo/bfo.owl
However, I was not able to find the IRI for the AFO ontology or at least some URL which produces a visualization at the webvowl service.
Question: How to visualize the contet of AFO?
You can try to have a look at it on the Ontology Lookup Service (OLS)
Full disclosure : I am responsible for OLS, but not AFO.

Horovod Timeline and MPI Tracing in Azure Machine Learning Workspace(MPI Configuration)

All,
I am trying to train a distributed model using Horovod on Azure Machine Learning Service as shown below.
estimator = TensorFlow(source_directory=script_folder,
entry_script='train_script.py',
script_params=script_params,
compute_target=compute_target_gpu_4,
conda_packages=['scikit-learn'],
node_count=2,
distributed_training=MpiConfiguration(),
framework_version = '1.13',
use_gpu=True
)
run = exp.submit(estimator)
How to enable Horovod timeline?
How to enable more detailed MPI tracing to see the communication between the nodes?
Thanks.
The following uses the Tensorflow Estimator class in the SDK, that distributed_training is set to Mpi().
Another sample using Horovod to train a genism sentence similarity model.
https://github.com/microsoft/nlp-recipes/blob/46c0658b79208763e97ae3171e9728560fe37171/examples/sentence_similarity/gensen_train.py

Weka EM cluster get "Error: Could not find or load main class test" in eclipse

I want to use weka to cluster tweets in the database in JSP. In GUI, I find only HierarchiccalClusterer and Filteredcluster available for string clustering. Then I find this clusteringdemo sample code from weka official website: https://svn.scms.waikato.ac.nz/svn/weka/trunk/wekaexamples/src/main/java/wekaexamples/clusterers/ClusteringDemo.java
However, after set up the sample arff code in weka directory, I get this error "Error: Could not find or load main class ClusteringDemo".
Can anyone help me to find out the reason?
I only change filename in the sentence data = DataSource.read(filename);. Besides, my classpath set up correctly for I already done some classifier.
1.- Maybe the ClusteringDemo.class is not in your classpath.
You should add the class of jar file to your project.
2.- Anyway, you can download the java code from: http://weka.wikispaces.com/file/detail/ClusteringDemo.java
Compile and run it (make sure that weka.jar is in your classpath).
3.- If you have added ClusteringDemo.java to your project. Make sure that it has the "package" line (the first line) according to its location. Otherwise Java will not be able to find it.
Good luck using EM, maybe you can also try N-grams + Naive Bayes.