how to run evolutionary algorithms using pyspark - pyspark

I want to run evolutionary algorithms like GA,PSO using pyspark on spark.How to do this using MLLib using Deap python library.Is there any other library available to perform same task.

Have a look at my answer on how to use DEAP with Spark and see if it works for you.
Here is an example of how to configure the DEAP toolbox to replace the map function by a custom one using Spark.
from pyspark import SparkContext
sc = SparkContext(appName="DEAP")
def sparkMap(algorithm, population):
return sc.parallelize(population).map(algorithm)
toolbox.register("map", sparkMap)

In https://github.com/DEAP/deap/issues/268 they show how to do this in the DEAP package. However this is an issue. but they mention there is a pull request (https://github.com/DEAP/deap/pull/76), and is seems the fixed code/branch is from a forked repo.
It sounds like if you rebuild the package with that code it should resolve the issue.
Another resource I found, haven't tried it, is https://apacheignite.readme.io/docs/genetic-algorithms.
Also came across this https://github.com/paduraru2009/genetic-algorithm-with-Spark-for-test-generation

Related

Nrwl build Library Error: File is not under rootDir

The error occurs when I use the library in another library.
Library import is working fine in the app but not working within libraries.
And not able to generate the build of a library.
All libraries are publishable.
Error:
Nrwl.v13 Files Structure within lib folder:
Very difficult to debug. It can be related to circular dependency issue. Are you sure you don't import code from library which import code from the same library ?
A import B
B import A
If this is the case, you should handle this by creating a C library which will be imported by A and B or find a solution for the A or B which will not depend on each other.
Code example will be helpful for help you.
From https://github.com/nrwl/nx/issues/10785#issuecomment-1158916416:
There seems to have been an issue with a migration that was scheduled
for a version but the migration itself was released in another
version, so that might have caused the migration to not run in some
scenarios. That migration should have added the following in nx.json
for anyone having their nx.json extending from
nx/presets/core.json or nx/presets/npm.json:
{
...
"pluginsConfig": {
"#nrwl/js": {
"analyzeSourceFiles": true
}
}
}
Could you please add the above snippet to your nx.json and try again? If after applying the change it doesn't pick
it up immediately, run nx reset and then try again.
This didn't work for me though, so I opened nx issue #11583: library importing other library using wildcard path mapping fails with "is not under 'rootDir'"
I had this issue in one of our monorepo and it was caused by the fact that one of our library's name wasn't valid. We had something like #organisation/test-utils/e2e which we ended up renaming to #organisation/test-utils-e2e (take note of the / usage).

Spark cannot find case class on classpath

I have an issue where Spark is failing to generate code for a case class. Here is the spark error
Caused by: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 52, Column 43: Identifier expected instead of '.'
Here is the referenced line in the generated code
/* 052 */ private com.avro.message.video.public.MetricObservation MapObjects_loopValue34;
It should be noted that com.avro.message.video.public.MetricObservation is a nested case class in part of a larger hierarchy. It is also used in other places in the code fine. It should also be noted that this pipeline works fine if I use the RDD API, but I want to use the Dataset API because I want to write out the Dataset in parquet. Has anyone seen this issue before?
I'm using Scala 2.11 and Spark 2.1.0. I was able to upgrade to Spark 2.2.1 and the issue is still there.
Do you think that SI-7555 or something like it has any bearing on this? I have noticed the past that Scala reflection has had issues generating TypeTags for statically nested classes. Do you think something like that is going on or is this strictly a catalyst issue in spark? You might want to file a spark ticket too.
So it turns out that changing the package name of the affect class "fixes" (ie made go away) the problem. I really have no idea why this is or even how to reproduce it in a small test case. What worked for me was I just created a higher level package that work. Specifically com.avro.message.video.public -> com.avro.message.publicVideo.

how to pass data in Zeppelin to DS3.js for Spark visualization

The graph options with Zeppelin are pretty basic. So I am looking for an example of how to do something simple, like a barchart, with ds3.js. From what I can tell that would be the best graphing library to use to create stunning graphs.
Anyway my question is how to pass data to the JavaScript code. With regular Zeppelin charts you write scala or other code and then save that in a dataframe. Then on the next line you use the %sql option and you can write a SQL command and then buttons appear to let you graph the data.
But what I have found looking on the internet is no indication that data created in the scala code section would be passed to the Angular section where you put the ds3.js code.
Some examples I found are like this one where all the html and Javascript is put in one giant print statement in the scala code https://rawkintrevo.org/2016/09/20/gelly-on-apache-flink/
And then there is an example like this one Using d3.js with Apache Zeppelin where the Zeppelin line is all JavaScript, but the data is just a locally created array.
So I need (1) an example and (2) some understanding of how RDDs ad Dataframes can be passed into the JavaScript code, which of course is on a different line that the scala code. How do you bring objects in the scala section of the notebook into scope for the Javascript section.
You can refer to zeppelin docs for a good getting-started guide to creating custom visualization. Also, you might want to check out the code of some of the built-ins viz.
Regarding how data from DataFrames are passed to js, I'm pretty sure z.show or %sql triggers dataFrame.take(${zeppelin.spark.maxResult}) which collects the RDD[T] as a Seq[T] object to the driver whose elements are then used to render graphs.
Alternatively if you have a javascript graph defined in another paragraph, you can also usez.angularBind("values", rdd.take(maxResult)) to send the data to the angular view. There's a really nice answer here on the subject which might help.
Hope you find this helpful.

How does Server.call work in elixir-mongo?

I'm learning Elixir and attempting to use the elixir-mongo library. During the auth/1 command, A the function uses Server.call, piping in the MongoDB request string. looking at the Mongo.Server class, it does not appear to be an actual genserver, nor have a method to match call/1. How is this working?
With high probability it doesn't work. Mongo.Server module doesn't export call function. There are no macros that generate it magically. My guess is that master branch is currently broken. If you are using the library and want to dig into the sources make sure you are looking at the same tag as the version you are using in your project.
Also, there are no classes and methods in Elixir. There are modules and functions :)

PredictionIO evaluation in classifier

Has someone achieved to make an evaluation correctly using PredictionIO?
I am using the classification template in a server, but using more attributes, it is trained with a dataset I got and makes predictions well. However, it fails doing the evaluation, and I have all the data labeled, the data I use to train the algorithm...
The error:
Exception in thread "main" java.lang.IllegalArgumentException:
requirement failed: RDD[labeledPoints] in PreparedData cannot be
empty. Please check if DataSource generates TrainingData and
Preparator generates PreparedData correctly.
DataSource.scala and Preparator.scala should work as they are.
Thanks for any help
Evaluation (using command shown in the doc), is working with the latest, given that you set spark to 1.4.1 in your build.sbt . See this github issue:
https://github.com/PredictionIO/template-scala-parallel-textclassification/issues/2
Finally I got It starting all the again. For classification, be sure to follow the guide steps and: 1. Add all the attrs you use about your dataset to Engine, Evaluation, DataSource and NaiveBayesAlgorithms scala files. 2. Rename the app name for yours in engine.json and Evaluation.scala. 3. Re build the app "pio build --verbose". 4. Now you can evaluate, "pio eval yourpackagename.AccuracyEvaluation yourpackagename.EngineParamsList"