PredictionIO evaluation in classifier - prediction

Has someone achieved to make an evaluation correctly using PredictionIO?
I am using the classification template in a server, but using more attributes, it is trained with a dataset I got and makes predictions well. However, it fails doing the evaluation, and I have all the data labeled, the data I use to train the algorithm...
The error:
Exception in thread "main" java.lang.IllegalArgumentException:
requirement failed: RDD[labeledPoints] in PreparedData cannot be
empty. Please check if DataSource generates TrainingData and
Preparator generates PreparedData correctly.
DataSource.scala and Preparator.scala should work as they are.
Thanks for any help

Evaluation (using command shown in the doc), is working with the latest, given that you set spark to 1.4.1 in your build.sbt . See this github issue:
https://github.com/PredictionIO/template-scala-parallel-textclassification/issues/2

Finally I got It starting all the again. For classification, be sure to follow the guide steps and: 1. Add all the attrs you use about your dataset to Engine, Evaluation, DataSource and NaiveBayesAlgorithms scala files. 2. Rename the app name for yours in engine.json and Evaluation.scala. 3. Re build the app "pio build --verbose". 4. Now you can evaluate, "pio eval yourpackagename.AccuracyEvaluation yourpackagename.EngineParamsList"

Related

Error of ModelStructure/Outputs by using FMU container

I am trying to combine three FMUs into one FMU that contains all of the three. Specifically, I have one FMU of a pandapower electricity network and 2 FMUs that are CSV files converted to FMUs by using PythonFMU tool. All of the FMUs have been tested by the FMU Check and they have been simulated together to check that everything works fine.
Then I am using FMPy tool to combine all of them together and export successfully the final FMU.
However, when I am trying to validate this I get the following error:
ModelStructure/Outputs must have exactly one entry for each variable with causality="output".
Any idea of what is wrong here?
Your problem seems to be fixed in https://github.com/CATIA-Systems/FMPy/issues/281#issuecomment-879092943
You should try to re-generate the containerized FMU with the developmenet branch of fmpy.

Spark cannot find case class on classpath

I have an issue where Spark is failing to generate code for a case class. Here is the spark error
Caused by: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 52, Column 43: Identifier expected instead of '.'
Here is the referenced line in the generated code
/* 052 */ private com.avro.message.video.public.MetricObservation MapObjects_loopValue34;
It should be noted that com.avro.message.video.public.MetricObservation is a nested case class in part of a larger hierarchy. It is also used in other places in the code fine. It should also be noted that this pipeline works fine if I use the RDD API, but I want to use the Dataset API because I want to write out the Dataset in parquet. Has anyone seen this issue before?
I'm using Scala 2.11 and Spark 2.1.0. I was able to upgrade to Spark 2.2.1 and the issue is still there.
Do you think that SI-7555 or something like it has any bearing on this? I have noticed the past that Scala reflection has had issues generating TypeTags for statically nested classes. Do you think something like that is going on or is this strictly a catalyst issue in spark? You might want to file a spark ticket too.
So it turns out that changing the package name of the affect class "fixes" (ie made go away) the problem. I really have no idea why this is or even how to reproduce it in a small test case. What worked for me was I just created a higher level package that work. Specifically com.avro.message.video.public -> com.avro.message.publicVideo.

Spark word count in Scala (running in Apache Sandbox)

I am trying to do a word count lab in Spark on Scala. I am able to successfully load the text file into a variable (RDD), but when I do the .flatmap, .map, and reduceByKey, I receive the attached error message. I am new to this, so any type of help would be greatly appreciated. Please let me know.capture
Your program is failing because it was not able to detect the file present on Hadoop
Need to specify the file in the following format
sc.textFile("hdfs://namenodedetails:8020/input.txt")
You need to give the complete qualified path of the file. Since Spark builds a Dependency graph and evaluates lazily when an action is called, you are facing the error when you are trying to call an action.
It is better to debug after reading the file from HDFS using .first or .take(n) methods

Apache Spark ALS - how is it solving the least square?

The source code for the Apache Spark ALS can be found here.
I am wondering where the Least Squares solving is going on in this source code? I can't find it for the life of me.
When following a tutorial/walkthrough on Collaborative Filtering, it shows that to perform the ALS function on some ratings you call ALS.train(ratings, rank, numIterations, lambda). Checking the source code and the train function calls the run function which returns a MatrixFactorizationModel with the predicted ratings in it.
Additionally, the API for ALS (found here) says there is a method called solveLeastSquares but it isn't in the source code found in the first link. I would like to learn how the least squares problem is being solved so I can adjust it as necessary.
From the documentation:
(Breaking change) In ALS, the extraneous method solveLeastSquares has been removed. The DeveloperApi method analyzeBlocks was also removed.
However, you can change the branch to be 1.1 per the docs you referenced and you will see the solveLeastSquares method

Problems compiling routes after migrating to Play 2.1

After migrating to Play-2.1 I stuck into problem that routes compiler stopped working for my routes file. It's been completely fine with Play-2.0.4, but now I'm getting the build error and can't find any workaround for it.
In my project I'm using cake pattern, so controller actions are visible not through <package>.<controller class>.<action>, but through <package>.<component registry>.<controller instance>.<action>. New Play routes compiler is using all action path components except for the last two to form package name that will be used in managed sources (as far as I can get code in https://github.com/playframework/Play20/blob/2.1.0/framework/src/routes-compiler/src/main/scala/play/router/RoutesCompiler.scala). In my case it leads to situation when <package>.<component registry> is chosen as package name, which results in error during build:
[error] server/target/scala-2.10/src_managed/main/com/grumpycats/mmmtg/componentsRegistry/routes.java:5: componentsRegistry is already defined as object componentsRegistry
[error] package com.grumpycats.mmmtg.componentsRegistry;
I made the sample project to demonstrate this problem: https://github.com/rmihael/play-2.1-routes-problem
Is it possible to workaround this problem somehow without dropping cake pattern for controllers? It's the pity that I can't proceed with Play 2.1 due to this problem.
Because of reputation I can not create a comment.
The convention is that classes and objects start with upper case. This convention is applied to pattern matching as well. Looking at a string there seems to be no difference between a package object and normal object (appart from the case). I am not sure how Play 2.1 handles things, that's why this is not an answer but a comment.
You could try the new # syntax in the router. That allows you to create an instance from the Global class. You would still specify <package>.<controller class>.<action>, but in the Global you get it from somewhere else (for example a component registry).
You can find a bit of extra information here under the 'Managed Controller classes instantiation': http://www.playframework.com/documentation/2.1.0/Highlights
This demo project shows it's usage: https://github.com/guillaumebort/play20-spring-demo