How to import saved ML model on Scala? - scala

I saved a linear regression model locally, which I want to use in later applications.
How can I import and call a saved model which was not created in the current application?
I'm using Scala on IntelliJ.
This is how I saved the model:
LRmodel.write.save(ORCpath+"LinearRegModel")

To be able to load the model, you need to have defined the same model prior. So it is not straight forward to load the model in a new environment.
ie. You cannot load a model which contain 4 nodes into a model which contains 10.
You can see here: https://spark.apache.org/docs/latest/mllib-linear-methods.html#logistic-regression
That the method to load is:
val sameModel = LogisticRegressionModel.load(
sc,
"target/tmp/scalaLogisticRegressionWithLBFGSModel"
)

Different application isn't a problem. You can simply load through the save path as per #Wonay's instruction.
The issue is when you move to another file system, say a move from local to Hadoop...or just to another local PC. In this case, frankly, it is best to just regenerate the model on your new filesystem.

Related

How do you generate data from a file in scalacheck?

I want to run scalacheck on a sample dataset that I have in a file. How do I create a generator that reads data from this file and allows me to check a property on it?
You could read in all the data in advance and then use
Gen.oneOf(dataSet)
to randomly choose one of the values in the set.
If however the data set is too big to read in at once you could just generate the access index using
Gen.choose(1, setSize)
and read only the selected entry.

Auto-Numeric Nugget Ignores Splits in SPSS Modeler

I'm trying to explore a continuous target variable in SPSS Modeler v. 18.2, using a split variable ("Cohort"). In other models that have a nominal target variable, I'm able to use the auto-classifier to generate models on each split---but in this model when I use the auto-numeric node it ignores the splits entirely. Here is the stream:
In the data file, I have "Cohort" set to Split:
In the node, in the Fields tab, I have added Cohort to the splits...
...and in the Model tab I have checked the build model for each split box:
The nugget doesn't include the splits---in the Summary tab it doesn't look like it's in the model at all:
My work-around is to use Select nodes for each split but that has disadvantages---thank you in advance for any help/corrections.
I am currently using IBM SPSS Modeler 18.0 but I am seeing the exact same behavior when using one of the demo data sets supplied with Modeler. I would consider this to be a defect and something that would need to be addressed by IBM's development team.
I suggest that you replicate the issue with one of the data sets from the "Demos" folder such as the "car_insurance_claims.sav" and then open a support ticket with the IBM SPSS technical support to have this resolved.

How to call a sas dataset by its label or where to check its name

I have a problem in dealing with SAS Enterprise Guide that runs on the server of my client.
I do not have access to the libraries so, in order to use the datasets the only thing we can do is to store them on the local disk C: of the computer and drag them to SAS.
We can not create libraries because the server does not read local paths.
Once you drag a table, let's call it "mydata" in SAS, the table is automatically renamed "mydata9865" with random numbers at the end and "mydata" is its label.
If you right-click the table and go to properties, you can't find the name of the table, just the label.
The only way I found to check the real name of the dataset is to open the Query Builder and check the name in the code preview.
The problem is that I am dealing with tables of millions of records and the machine I am using is very slow, so whenever I want to open the Query Building, just to check the table's name, it takes sometimes even an hour.
I am not a SAS expert, so I am sure there is a smarter way to do so. Is it possible for instance to use the table by calling it with its label?
data mydata2;
set mydata;
run;
instead of
set mydata9865?
Or is there some place I can rapidly check the name of the table without going through the query builder?
I tried to google it but I can't find anything, I hope someone will be able to help me!
Thank you in advance
Hover the mouse pointer over a data node to see it's attributes. The data set name is the File name: value.
For example:
In this example I had renamed the nodes created by two different queries to be the same (doable:yes, smart:maybe not). NOTE: A data node Label: is not necessarily the same as it's underlying data set's label metadata.
Regarding
use the table by calling it with its label?
Two nodes can have the same label, and is a a situation that defeats this approach.
Use the COPY task to upload your data explicitly. It sounds like you're not adding your data to the projects properly so SAS automatically assigns a name, rather than if you explicitly import or load your data.
Problem solved! I should have simply upload the data to the server with Tasks->Data->Upload Data Sets to Server but I didn't know this task so I didn't know it was possible to do it at all!
https://communities.sas.com/t5/SAS-Enterprise-Guide/Importing-sas-data-sets-from-C-drive-into-SAS-EG-not-possible/td-p/135184
Thank you everybody for you help!

Running Netlogo headless on the cloud

I've written a NetLogo model to model agent movement in a landscape. I'd like to run this model from the command prompt, using AWs/Google Compute. The model uses about 500MB worth of input rasters and shapefiles and writes rasters and csv files. It also uses the extensions gis, rnd, cf, table and csv.
Would this be possible using the Controlling API? (https://github.com/NetLogo/NetLogo/wiki/Controlling-API). Can I just use the steps listed in the link? I have not tried running NetLogo from the command prompt before.
Also, I do not want to run BehaviourSpace as it is not relevant to this model.
A BehaviorSpace experiment can consist of only a single run, so BehaviorSpace may actually be relevant to you here. You only need to write one short XML file (or no new files at all, if the experiment setup you want is already part of the model) to do it this way.
Whereas if you go through the controlling API, you will have to write and compile Java (or Scala) code, which is a substantially more complex task.
But if you decide to go the controlling API route: yes, that works too, and it is documented, as you've already noticed.

how can i control jpa data source creation

I have eclipse link data sources created using weblogic server. I have 2 data sources one for read and another for write. I want to have default load the the read data source and in run time(i mean request) want to create the write data source.
is there any option like that. i have tried with 2 data sources not above said.
can any one tell me how to proceed.