Running Apache SystemML

Running Apache SystemML - pyspark

I am trying to get Apache SystemML set up and running (on Ubuntu) in a standalone mode.
I am relying on the github documentation to set this up.
I would like to run this with pyspark and I am following the instructions from this beginner's guide
After successfully installing systemml and launching pyspark shell, I tried the following code from the tutorial:
import systemml as sml
import numpy as np
m1 = sml.matrix(np.ones((3,3)) + 2)
The import statements work fine, however I encounter the following error with the 3rd line:
ImportError: Unable to load systemML.jar into the current pyspark session.Hint: Provide
the following argument to pyspark: --driver-class-path /usr/local...
As per the hint provided, I launched pyspark again appending the "--driver -class-path..." at the end. But I encountered the same error.
While googling for this, I found this error being highlighted in the Apache SystemML documentations. However, I wasn't really able to address the issue.
Any help will be greatly appreciated!

Can you please confirm that "/usr/local..." in your comment is path to systemml-*-incubating-SNAPSHOT.jar and that file exists ?

Related

cannot import name 'AuthorityMatchInfo' from 'pyproj._crs'

I am trying to use the calc module in metpy. I have installed it both via pip and 'conda forge' option. When i run my script in spyder, i get the following error
ImportError: cannot import name 'AuthorityMatchInfo' from 'pyproj._crs' eg Example file path : (~anaconda3/lib/python3.7/site-packages/pyproj/_crs.cpython-37m-x86_64-linux-gnu.so)

Had the same issue two times and solved it by following advice I found on github.
Basically one needs to uninstall pyproj and reinstall.

PySpark on Linux with pycharm - first exception error

I am trying to run my first PySpark script on a Linux VM I configured. The error message I have is KeyError: SPARK_HOME when I run the following:
from os import environ
from pyspark import SparkContext
I momentarily made this error go away by running export SPARK_HOME=~/spark-2.4.3-bin-hadoop2.7. I then ran into a new error error=2, No such file or directory. Searching took me to this page:https://community.cloudera.com/t5/Community-Articles/Tutorial-Install-Configure-iPython-and-create-run-PySpark/ta-p/246400. I then ran export PYSPARK_PYTHON=~/python3*. This brings me back to experiencing the KeyError: SPARK_HOME error.
Honestly, I'm stumbling through this, because it's my first time configuring Spark, and using PySpark. I still don't quite understand the ins-and-outs of pycharm, as well.
I expect to be able to run the following basic sample script on this page: https://medium.com/parrot-prediction/integrating-apache-spark-2-0-with-pycharm-ce-522a6784886f with no issues.

there is a package called findspark here
or you may use below code to set path if not found in environment
import os
if 'SPARK_HOME' not in os.environ:
os.environ['SPARK_HOME'] = 'full_path_to_spark_root'
[code continues]

Error when running pyspark

I tried to run pyspark via terminal. From my terminal, I runs snotebook and it will automatically load jupiter. After that, when I select python3, the error comes out from the terminal.
[IPKernelApp] WARNING | Unknown error in handling PYTHONSTARTUP file
/Users/simon/spark-1.6.0-bin-hadoop2.6/python/pyspark/shell.py
Here's my .bash_profile setting:
export PATH="/Users/simon/anaconda/bin:$PATH"
export SPARK_HOME=~/spark-1.6.0-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
export PYSPARK_PYTHON=python3
alias snotebook='$SPARK_HOME/bin/pyspark'
Please let me know if you have any ideas, thanks.

You need to add below line in your code
PYSPARK_DRIVER_PYTHON=ipython
or
PYSPARK_DRIVER_PYTHON=ipython3
Hope it will help.

In my case, I was using a virtual environment and forgot to install Jupyter, so it was using some version that it found in the $PATH. Installing it inside the environment fixed this issue.

Spark now includes PySpark as part of the install, so remove the PySpark library unless you really need it.
Remove the old Spark, install latest version.
Install (pip) findspark library.
In Jupiter, import and use findspark:
import findspark
findspark.init()
Quick PySpark / Python 3 Check
import findspark
findspark.init()
from pyspark import SparkContext
sc = SparkContext()
print(sc)
sc.stop()

Virtualenv PyDev Undefined variable from import error

First of all I'm aware of the question here but I couldn't find a satisfied answer there. I don't want to ignore errors or use comments - I want to have the right settings in eclipse/pydev. My problem is pretty similar to this one.
I'm using Ubuntu 12.04 and installed a virtuenv for python 2.7 in my home directory. After installing several python packages (numpy, scipy, matplotlib, etc.) using pip, I installed eclipse 4.3 with pydev.
If I use the python system interpreter at /usr/bin/python everything works fine (except that I didn't want to use). However, if I try to set up a python interpreter using the virtualenv first I get this warning describe here. After clicking "proceed anyway", it seems to work. So far so good.
However e.g. import numpy as np gives for each np.* call the eclipse/pydev error Undefined variable from import, also the code completion doesn't work properly. It seems to work, e.g. for datetime, but not for numpy, scipy and matplotlib.
Does anybody figured out to configure eclipse correctly?
I already tried to add the numpy path manually to the virtualenv interpreter, but than I get the weird error:
import matplotlib.dates as mpl_dates
File "/home/pydev/myenv-py27/local/lib/python2.7/site-packages/matplotlib/init.py", line 149, in
import sys, os, tempfile
File "/usr/lib/python2.7/tempfile.py", line 34, in
from random import Random as _Random
ImportError: cannot import name Random

Need help to resolve errors un using py2exe missing

The following modules appear to be missing
email.Generator
email.Iterators
email.Utils
win32api
win32con
w in32pipe
wx
My setup file looks like this:
from distutils.core import setup
import py2exe
setup(console=['fwsm_migration.py'])
i'm using Python 2.5.4 and the py2exe 0.6.8
Looked here and outside for a peculiar solution but have not found one!!
read about using "optoins: but being new to python itself failing to know where to do it.
Please HELP!

Try cx_freeze. After having a hell of a time with py2exe cx_freeze compiled my script without any configuration. In the same environment Py2exe claimed I'd missed nine packages.
For simple scripts you only need to do:
cxfreeze hello.py --target-dir dist

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Running Apache SystemML - pyspark

Can you please confirm that "/usr/local..." in your comment is path to systemml-*-incubating-SNAPSHOT.jar and that file exists ?

Related

cannot import name 'AuthorityMatchInfo' from 'pyproj._crs'

PySpark on Linux with pycharm - first exception error

Error when running pyspark

Virtualenv PyDev Undefined variable from import error

Need help to resolve errors un using py2exe missing

Categories

Resources