GraphFrames with pySpark - pyspark

I want to use GraphFrames with PySpark (currently using Spark v2.3.3, on Google Dataproc).
After installing GraphFrames with
pip install graphframes
I try to run the follwing code:
from graphframes import *
localVertices = [(1,"A"), (2,"B"), (3, "C")]
localEdges = [(1,2,"love"), (2,1,"hate"), (2,3,"follow")]
v = sqlContext.createDataFrame(localVertices, ["id", "name"])
e = sqlContext.createDataFrame(localEdges, ["src", "dst", "action"])
g = GraphFrame(v, e)
but I get this error:
Py4JJavaError: An error occurred while calling o301.loadClass.
: java.lang.ClassNotFoundException: org.graphframes.GraphFramePythonAPI
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Any ideas how to fix this issue?

To use GraphFrames with Spark, you should install it as a Spark package, not a PIP package:
pyspark --packages graphframes:graphframes:0.7.0-spark2.3-s_2.11

In case you are using Jupyter for development start it from pyspark and not directly or from Anaconda. Meaning open the terminal and then run
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
pyspark --packages graphframes:graphframes:0.6.0-spark2.3-s_2.11
This starts Jupyter with the correct PySpark packages loaded in the background. If you then import it in your script with from graphframes import it will pick it up correctly and run

Related

Error when packaging EXE file using Netbeans

I am doing a project on Netbeans and trying to package an exe file and below is the error. I found some related topics to this, but none of them work for me. Can someone help me with this, please?
Thanks.
java.io.IOException: Exec failed with code 2 command [[C:\Program Files (x86)\Inno Setup 6\iscc.exe, /oC:\Users\kenny\Documents\NetBeansProjects\MyRebarsProject\dist\bundles, C:\Users\kenny\AppData\Local\Temp\fxbundler2533082479821554529\images\win-exe.image\MyRebarsProject.iss] in C:\Users\kenny\AppData\Local\Temp\fxbundler2533082479821554529\images\win-exe.image
at com.oracle.tools.packager.IOUtils.exec(IOUtils.java:165)
at com.oracle.tools.packager.IOUtils.exec(IOUtils.java:138)
at com.oracle.tools.packager.IOUtils.exec(IOUtils.java:132)
at com.oracle.tools.packager.windows.WinExeBundler.buildEXE(WinExeBundler.java:697)
at com.oracle.tools.packager.windows.WinExeBundler.bundle(WinExeBundler.java:366)
at com.oracle.tools.packager.windows.WinExeBundler.execute(WinExeBundler.java:173)
at com.sun.javafx.tools.packager.PackagerLib.generateNativeBundles(PackagerLib.java:352)
at com.sun.javafx.tools.packager.PackagerLib.generateDeploymentPackages(PackagerLib.java:319)
at com.sun.javafx.tools.ant.DeployFXTask.execute(DeployFXTask.java:286)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at jdk.nashorn.internal.scripts.Script$75$\^eval\_.:program(<eval>:225)
at jdk.nashorn.internal.runtime.ScriptFunctionData.invoke(ScriptFunctionData.java:637)
at jdk.nashorn.internal.runtime.ScriptFunction.invoke(ScriptFunction.java:494)
at jdk.nashorn.internal.runtime.ScriptRuntime.apply(ScriptRuntime.java:393)
at jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:449)
at jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:406)
at jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:402)
at jdk.nashorn.api.scripting.NashornScriptEngine.eval(NashornScriptEngine.java:155)
at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:264)
at sun.reflect.GeneratedMethodAccessor192.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.tools.ant.util.ReflectUtil.invoke(ReflectUtil.java:109)
at org.apache.tools.ant.util.ReflectWrapper.invoke(ReflectWrapper.java:81)
at org.apache.tools.ant.util.optional.JavaxScriptRunner.evaluateScript(JavaxScriptRunner.java:103)
at org.apache.tools.ant.util.optional.JavaxScriptRunner.executeScript(JavaxScriptRunner.java:67)
at org.apache.tools.ant.taskdefs.optional.Script.execute(Script.java:53)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:293)
at sun.reflect.GeneratedMethodAccessor248.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.Target.execute(Target.java:435)
at org.apache.tools.ant.Target.performTasks(Target.java:456)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1405)
at org.apache.tools.ant.Project.executeTarget(Project.java:1376)
at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
at org.apache.tools.ant.Project.executeTargets(Project.java:1260)
at org.apache.tools.ant.module.bridge.impl.BridgeImpl.run(BridgeImpl.java:286)
at org.apache.tools.ant.module.run.TargetExecutor.run(TargetExecutor.java:555)
at org.netbeans.core.execution.RunClassThread.run(RunClassThread.java:153)
C:\Users\kenny\Documents\NetBeansProjects\MyRebarsProject\nbproject\build-native.xml:736: Error: Bundler "EXE Installer" (exe) failed to produce a bundle.
BUILD FAILED (total time: 3 seconds)
It's fixed. Actually I had to reinstall Inno Setup 6 and add it to the path again.

Cannot run a scala script (fpmax by jdegoes)

Here's a brilliant web talk by J. A. De Goes: https://www.youtube.com/watch?v=sxudIMiOo68 - highly recommended for everyone interested in functional programming
And here's accompanying code gist:
https://gist.github.com/jdegoes/1b43f43e2d1e845201de853815ab3cb9
When I run $ scalac fpmax.scala, it compiles everything to a new directory fpmax
But then when I run scala App0, it gives me error:
Exception in thread "main" java.lang.NoClassDefFoundError: App0 (wrong name: fpmax/App0)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at scala.reflect.internal.util.ScalaClassLoader.$anonfun$tryClass$1(ScalaClassLoader.scala:45)
at scala.util.control.Exception$Catch.$anonfun$opt$1(Exception.scala:242)
at scala.util.control.Exception$Catch.apply(Exception.scala:224)
at scala.util.control.Exception$Catch.opt(Exception.scala:242)
at scala.reflect.internal.util.ScalaClassLoader.tryClass(ScalaClassLoader.scala:45)
at scala.reflect.internal.util.ScalaClassLoader.tryToLoadClass(ScalaClassLoader.scala:39)
at scala.reflect.internal.util.ScalaClassLoader.tryToLoadClass$(ScalaClassLoader.scala:39)
at scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.tryToLoadClass(ScalaClassLoader.scala:125)
at scala.reflect.internal.util.ScalaClassLoader$.classExists(ScalaClassLoader.scala:150)
at scala.tools.nsc.GenericRunnerCommand.guessHowToRun(GenericRunnerCommand.scala:36)
at scala.tools.nsc.GenericRunnerCommand.<init>(GenericRunnerCommand.scala:55)
at scala.tools.nsc.GenericRunnerCommand.<init>(GenericRunnerCommand.scala:18)
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:42)
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:101)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
I run scala 2.12.6.
What would be correct way to run this code snippet?
I had to edit it a bit to run App0.
Change def main: Unit to def main(args: Array[String]): Unit, so that it has main method with args as per Scala spec.
Let's compile it:
scala fpmax.scala. Indeed, this will create fpmax directory with all classes in it
Then run the App0:
scala fpmax.App0  5.50   ✔   12:21  19.03.19 
What is your name?
Alex
Hello, Alex, welcome to the game!
Dear Alex, please guess a number from 1 to 5:
4
You guessed wrong, Alex! The number was: 3
Do you want to continue, Alex?
n
Note, I am NOT going inside the fpmax directory with classes

Spark Submit: Class Not Found Exception

I am trying to submit a job to spark on my machine as so:
$ spark-submit --master local --class ai.affable.flint.Foo target/scala-2.11/flint.jar
However, this fails with the following error:
java.lang.ClassNotFoundException: ai.affable.flint.Foo
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
I have verfied that the JAR file exists and has a class called Foo:
$ jar tvf ./target/scala-2.11/flint.jar | grep Foo
2003 Fri Dec 14 20:53:40 MYT 2018 ai/affable/flint/Foo.class
...
This baffles me because:
a) the JAR exists b) the class exists in the jar 3) I have specified the fully qualified path and double checked for any path errors or mispellings.
Does anyone know what I am missing?
EDIT:
I got it to work by recreating the project in a fresh directory.I literally copy pasted the code and repeated the steps.
I will still like to know what I can do in situations like this short of recreating the project.

Issue with mongodb-spark connector for PySpark (Class not found exception:com.mongodb.spark.sql.DefaultSource)

I am trying to install mongodb spark connector. Everything goes well, however when I run the spark code, the following issue comes in. Please help.
MongoDBConnector - 2.2.2
Spark - 2.2.0
Mongodb Version - 3.6
18/05/08 11:18:39 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
Traceback (most recent call last):
File "/home/cisco/spark-mongo-test.py", line 7, in <module>
df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load()
File "/home/cisco/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 165, in load
File "/home/cisco/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/home/cisco/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/home/cisco/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o39.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.mongodb.spark.sql.DefaultSource. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:549)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:301)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.mongodb.spark.sql.DefaultSource.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21$$anonfun$apply$12.apply(DataSource.scala:533)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21$$anonfun$apply$12.apply(DataSource.scala:533)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21.apply(DataSource.scala:533)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21.apply(DataSource.scala:533)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:53
I think you should add this line to [spark-home]/conf/spark-defaults.conf:
spark.jars.packages org.mongodb.spark:mongo-spark-connector_2.11:2.3.0
For me the issue was that I did not have the jar file that included the mongo-spark-connector_2.11.jar file in my [spark-home]/jars directory. You can download them here:
https://jar-download.com/?search_box=mongo%20spark%20connector

How to compile and run H2 TriggerSample

I copied TriggerSample.java to this directory. Then:
javac -cp h2-1.3.168.jar TriggerSample.java
creates
TriggerSample$MyTrigger.class ... and ... TriggerSample.class
Then:
java TriggerSample
says:
Exception in thread "main" java.lang.NoClassDefFoundError: TriggerSample (wrong name: org/h2/samples/TriggerSample)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
also no go with:
java org.h2.samples.TriggerSample
java org/h2/samples/TriggerSample
How exactly to run that example from the command line?
This is a regular Java problem. The package name of class TriggerSample is org.h2.samples. You should use the directory name org/h2/samples.
Create a directory org/h2/samples
mkdir org/h2/samples
Move the file TriggerSample.java to that directory
Run
javac -cp h2-1.3.168.jar org/h2/samples/TriggerSample.java
Then run
java -cp h2-1.3.168.jar:. org.h2.samples.TriggerSample
Or: remove the package declaration of the file TriggerSample.java.