Keras model pyspark error - pyspark

I have a keras model that has been pickled as described in the following blog.
http://zachmoshe.com/2017/04/03/pickling-keras-models.html
What's strange is that, when I ran the model on an html file when read from python as open(filename), it worked as expected. But when running it on a file when read from pyspark, I am getting the following error:
AttributeError("'Model' object has no attribute '_feed_input_names'",)

You have to run make_keras_picklable() on each worker as well. Otherwise, the __setstate__ method of Model object on worker node is not updated and thus will not be deserialized as expected.

Related

Error of ModelStructure/Outputs by using FMU container

I am trying to combine three FMUs into one FMU that contains all of the three. Specifically, I have one FMU of a pandapower electricity network and 2 FMUs that are CSV files converted to FMUs by using PythonFMU tool. All of the FMUs have been tested by the FMU Check and they have been simulated together to check that everything works fine.
Then I am using FMPy tool to combine all of them together and export successfully the final FMU.
However, when I am trying to validate this I get the following error:
ModelStructure/Outputs must have exactly one entry for each variable with causality="output".
Any idea of what is wrong here?
Your problem seems to be fixed in https://github.com/CATIA-Systems/FMPy/issues/281#issuecomment-879092943
You should try to re-generate the containerized FMU with the developmenet branch of fmpy.

Spark giving multiple datasource error on saving parquet file

I am trying to learn spark and scala, on my trying to write the dataframe object of my result to parquet file by calling the parquet method, i am getting error as such
Code Base that fails:-
df2.write.mode(SaveMode.Overwrite).parquet(outputPath)
This fails too
df2.write.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").mode(SaveMode.Overwrite).parquet(outputPath)
Error Log:-
Exception in thread "main" org.apache.spark.sql.AnalysisException: Multiple sources found for parquet (org.apache.spark.sql.execution.datasources.v2.parquet.ParquetDataSourceV2, org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat), please specify the fully qualified class name.;
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:707)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:733)
at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:967)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:304)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:848)
How ever if I called another method for the save, the code works properly,
This works fine:-
df2.write.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").mode(SaveMode.Overwrite).save(outputPath)
Although I have a solution for the issue, i'd like to understand why the first approach is not working and how I can solve it.
The details of the specification i am using are:-
Scala 2.12.9
Java 1.8
Spark 2.4.4
P.S. This issue is only seen on spark-submit

Darknet model to onnx

I am currently working with Darknet on Yolov4, with 1 class.
I need to export those weights to onnx format, for tensorRT inference.
I've tried multiple technics, using ultralytics to convert or going from tensorflow to onnx. But none seems to work. Is there a direct way to do it?
Check this GitHub repo: https://github.com/Tianxiaomo/pytorch-YOLOv4
Running the demo_darknet2onnx.py script you'll be able to generate the ONNX model from the .cfg and .weights darknet files.
Usage example:
python demo_darknet2onnx.py <cfgFile> <weightFile> <imageFile> <batchSize>
You can also decide the batch size for the inference calls of the converted model.
The following repo exports yolov3 models from darknet to onnx, for tensorRT inference. You can use this as reference for your model.
https://github.com/jkjung-avt/tensorrt_demos/tree/master/yolo
You can convert scaled YOLO-yolov4,yolov4-csp.yolov4x-mish,yolov4-P5 etc models into onxx & its perfectly work fine.
https://github.com/linghu8812/tensorrt_inference

Spark word count in Scala (running in Apache Sandbox)

I am trying to do a word count lab in Spark on Scala. I am able to successfully load the text file into a variable (RDD), but when I do the .flatmap, .map, and reduceByKey, I receive the attached error message. I am new to this, so any type of help would be greatly appreciated. Please let me know.capture
Your program is failing because it was not able to detect the file present on Hadoop
Need to specify the file in the following format
sc.textFile("hdfs://namenodedetails:8020/input.txt")
You need to give the complete qualified path of the file. Since Spark builds a Dependency graph and evaluates lazily when an action is called, you are facing the error when you are trying to call an action.
It is better to debug after reading the file from HDFS using .first or .take(n) methods

PredictionIO evaluation in classifier

Has someone achieved to make an evaluation correctly using PredictionIO?
I am using the classification template in a server, but using more attributes, it is trained with a dataset I got and makes predictions well. However, it fails doing the evaluation, and I have all the data labeled, the data I use to train the algorithm...
The error:
Exception in thread "main" java.lang.IllegalArgumentException:
requirement failed: RDD[labeledPoints] in PreparedData cannot be
empty. Please check if DataSource generates TrainingData and
Preparator generates PreparedData correctly.
DataSource.scala and Preparator.scala should work as they are.
Thanks for any help
Evaluation (using command shown in the doc), is working with the latest, given that you set spark to 1.4.1 in your build.sbt . See this github issue:
https://github.com/PredictionIO/template-scala-parallel-textclassification/issues/2
Finally I got It starting all the again. For classification, be sure to follow the guide steps and: 1. Add all the attrs you use about your dataset to Engine, Evaluation, DataSource and NaiveBayesAlgorithms scala files. 2. Rename the app name for yours in engine.json and Evaluation.scala. 3. Re build the app "pio build --verbose". 4. Now you can evaluate, "pio eval yourpackagename.AccuracyEvaluation yourpackagename.EngineParamsList"