Vertex AI AutoML regression - batch predition error due to datatype mismatch - prediction

I trained a Vertex AI AutoML regression model (using the UI).
I ran a Batch Prediction (alos with the UI) and it failed because of a datatype mismatch:
The Batch Prediction returned an error table in the export location in BigQuery.
The Error :
In the output from the Batch Prediction, DISCOUNT_PCT is indeed a STRING:
But in the table I loaded for the Batch Prediction, it is a NUMERIC (as it is in the data I used to train the model):
It looks like the Batch Prediction process somehow changed the datatype of the table I loaded. Why is this happening and how can I solve it?

Related

How to export data from built in database to excel after Parameter Variation Run in AnyLogic?

Currently, I am writing various timeMeasureEnd data to excel files after simulation runs during Parameters Variation experiment. I am currently using excelFile.setCellValue(root.timeMeasureEnd.distribution.mean(), 1, 1, row + 1); row++; in the After simulation run field of Parameter Variation Experiment. How can I export a table from the internal database, for example flowchart_stats_time_in_state_log that is cumulative of all of the simulation runs of a parameters variation experiment?

Am I using too many training data in GEE?

I am running a classification script in GEE and I have about 2100 training data since my AOI is a region in Italy and have many classes. I receive the following error while I try save my script:
Script error File too large (larger than 512KB).
I tried cancelling some of the training data and it saves. I thought there is no limit in GEE to choose training points. How can I know what is the limit so I adjust my training points or if there is a way to save the script without deleting any points.
Here is the link to my code
The Earth Engine Code Editor “drawing tools” are a convenient, but not very scalable, way to create geometry. The error you're getting is because “under the covers” they actually create additional code that is part of your script file. Not only is this fairly verbose (hence the error you received), it's not very efficient to run, either.
In order to use large training data sets, you will need to create your point data in another tool and upload it (using CSV or SHP files) to become one or more Earth Engine “table” assets, and use those from your script.

Pyspark Linear forecast

I am still new to world of Pyspark and Big data.....
My problem is related to Linear forecasting function and how to derive this data for a larger dataset in pyspark
Below is the link for data which i use for scenario value calculation
Scenario_Data
Scenario Data with output using return
Based on expected return i calculate scenario value
Example if the expected return is 3% ---> i manually identify the rows which will provide values for X & Y.....so in this case 3% will be between 1% and 5% after identifying this row manually, i calculate scenario value using formulae in excel (forecast.linear), so in this case of 3% , my scenario value computed will be -162.5
Objective is to calculate all of this within pyspark with no manual effort which was mentioned above
Let me know if you need any further details on this query
Thanks a lot in advance for the help
Note: I am using databricks for this task
Regards
Hitesh

Machine Learning: How to handle discrete and continuous data together

I'm posting to ask whether there any methodologies, or ideas as to how to handle discrete and continuous data in a classifying problem.
In my situation, I have a bunch of independent "batches" that have discrete data. This is process related data, and so for each batch, there are separate points. I also have a dataset, that varies with time for the same batches. This time however there are many time observations for every batch. The data sets look like below:
Data Set 1
Batch 1 DiscreteInfo(1) DiscreteInfo(2) ....... DiscreteInfo(n)
Batch 2 DiscreteInfo(1) DiscreteInfo(2) ....... DiscreteInfo(n)
Batch 3 DiscreteInfo(1) DiscreteInfo(2) ....... DiscreteInfo(n)
Batch 4 DiscreteInfo(1) DiscreteInfo(2) ....... DiscreteInfo(n)
Data Set 2
Batch 1 t(1) TimeData
Batch 1 t(2) TimeData
Batch 1 t(3) TimeData
Batch 1 t(4) TimeData
.
.
.
.
Batch n (t1) TimeData
Batch n (t2) TimeData
Batch n (t3) TimeData
I am trying to classify whether all this data belongs to a 'Good' batch, a 'Bad' batch, or a 'so-so' batch. This is determined by one specific discrete parameter (not used in the data sets).
I'm very new to machine learning; any input or ideas would be appreciated. I'm using the matlab classification learner to try to tackle this problem.
There are a few things that you need to consider while dealing with a classification problem.
Training Data. We need training data for classification, i-e we need all the above mentioned attribute's values along with the class value whether it is 'Good' or 'Bad' or 'so-so'.
Using this we can train a model, and then given a new data for all the trained attributes we can predict which class it belongs to.
As far as discrete and continuous is concerned, There is no difference in the way we handle discrete and continuous data. In fact, for this case we can generate a new attribute which will be a function of all the other time variables for a given batch and then perform the classification. If you provide an instance of the data-set then the question can be answered more precisely.

Error using caffe Invalid input size

I tried to train my own neural net using my own imagedatabase as described in
http://caffe.berkeleyvision.org/gathered/examples/imagenet.html
However when I want to check the neural net after training on some standard images using the matlab wrapper I get the following output / error:
Done with init
Using GPU Mode
Done with set_mode
Elapsed time is 3.215971 seconds.
Error using caffe
Invalid input size
I used the matlab wrapper before to extract cnn features based on a pretrained model. It worked. So I don't think the input size of my images is the problem (They are converted to the correct size internally by the function "prepare_image").
Has anyone an idea what could be the error?
Found the solution: I was referencing the wrong ".prototxt" file (Its a little bit confusing because the files are quite similar.
So for computing features using the matlab wrapper one needs to reference the following to files in "matcaffe_demo.m":
models/bvlc_reference_caffenet/deploy.prototxt
models/bvlc_reference_caffenet/MyModel_caffenet_train_iter_450000.caffemodel
where "MyModel_caffenet_train_iter_450000.caffemodel" is the only file needed which is created during training.
In the beginning I was accidently referencing
models/bvlc_reference_caffenet/MyModel_train_val.prototxt
which was the ".prototxt" file used for training.