Running ./pyspark fails to find local directories - pyspark

After installing Spark I am trying to run PySpark from the installation folder:
opt/spark/bin/pyspark
But I get the following errors:
opt/spark/bin/pyspark: line 24: /opt/spark/bin/load-spark-env.sh: No such file or directory
opt/spark/bin/pyspark: line 68: /opt/spark/bin/spark-submit: No such file or directory
opt/spark/bin/pyspark: line 68: exec: /opt/spark/bin/spark-submit: cannot execute: No such file or directory
Why is this happening when I can see these items in their respective directories? I'm also trying to get PySpark to run standalone as a command, but I'd imagine that I must solve the former problem first.
I am running this on macOS.

This error indicates that SPARK_HOME is not set. Try this:
export SPARK_HOME=/opt/spark
pyspark
FYI, it is strongly recommended to install software on mac OS using a package manager, like https://brew.sh

This is the configuration:
export SPARK_HOME=<YOUR-PATH>/spark-2.4.4-bin-hadoop2.7
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH
And if you are thinking to use notebook as well:
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
export PYSPARK_PYTHON=python3
export PATH=$SPARK_HOME:$PATH:~/.local/bin:$JAVA_HOME/bin:$JAVA_HOME/jre/bin

Related

set PYSPARK_SUBMIT_ARGS="--name" "PySparkShell" "pyspark-shell" && jupyter notebook

I'm looking to install PySpark on my Windows 10 machine and have been unable to correctly specify the PYSPARK_SUBMIT_ARGS argument.
This is the error I'm seeing when I run the "pyspark" command from gitbash:
$ pyspark
set PYSPARK_SUBMIT_ARGS="--name" "PySparkShell" "pyspark-shell" && jupyter notebook
I've uninstalled all versions of Java, except version 8. Within my .bashrc file, my path is currently specified as:
export JAVA_HOME="C:\PROGRA~2\Java\jre1.8.0_261"
export PYSPARK_SUBMIT_ARGS="--master local[*] pyspark-shell"
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
export PYSPARK_DRIVER_PYTHON="jupyter"
export SPARK_HOME="C:/spark/spark-2.4.7-bin-hadoop2.7"
export PATH=$SPARK_HOME/bin:$PATH
And JAVA_HOME is specified within my env variables and set in Path as well.
I would really appreciate any additional troubleshooting techniques!
Thank you so much!!!
Try to run from windows command prompt , having issues with gitbash

MongoDB entries sinde my .bash_profile are not recognized (Mac OS)

I installed MongoDB on my Macbook (Catalina 10.15.3) and placed all the necessary entries in my .bash_profile. However, these entries are not recognized, so I have to manually enter the following again and again in order to start MongoDB (for each new terminal):
export MONGO_PATH=/Users/codehan/Desktop/MongoDB/mongodb-macos-x86_64-4.2.3
export PATH=$MONGO_PATH/bin:$PATH
Otherwise I always get the following in the terminal:
zsh: command not found: mongo
But as soon as I enter export MONGO_PATH=/Users/codehan/Desktop/MongoDB/mongodb-macos-x86_64-4.2.3 and export PATH=$MONGO_PATH/bin:$PATH in my terminal it works again.
How can I fix this problem?
With zsh, the configuration file is $HOME/.zshrc

spark-notebook: command not found

I want to set up spark notebook on my laptop following the instructions listed in http://spark-notebook.io I gave the command bin/spark-notebook and I'm getting:
-bash: bin/spark-notebook: command not found
How to resolve this? I want to run spark-notebook for spark standalone and scala.
You can download
spark-notebook-0.7.0-pre2-scala-2.10.5-spark-1.6.3-hadoop-2.7.2-with-parquet.tqz
Set the path in bashrc
Example :
$sudo gedit ~/.bashrc
export SPARK_HOME=/yor/path/
export PATH=$PATH:$SPARK+HOME/bin
Then start your notebook following command...
$spark-notebook

Can't run ibm_db in Jupyter Notebook

I am trying to run ibm_db in a jupyter notebook. When I run ibm_db I get the below error.
ImportError Traceback (most recent call last)
in ()
----> 1 import ibm_db
ImportError: dlopen(/Users/myName/anaconda/envs/householding/lib/python3.6/site-packages/ibm_db.cpython-36m-darwin.so, 2): Library not loaded: libdb2.dylib
Referenced from: /Users/myName/anaconda/envs/householding/lib/python3.6/site-packages/ibm_db.cpython-36m-darwin.so
Reason: image not found
When i run os.getcwd() I get '/Users/myName'
What I think is happening is that because my current directory is to levels down from the start of the path dlopen is looking for, it is failing. I've done some looking around but can't find a way to change where dlopen is looking
You have to actually update your environment variable
DYLD_LIBRARY_PATH
to include
/ibm_db-2.0.8-py3.6-macosx-10.6-intel.egg/clidriver/lib
If you have installed ibm_db-2.0.8 on python3.6,
On terminal write
export DYLD_LIBRARY_PATH=/Users/myName/anaconda/envs/householding/lib/python3.6/site-packages/ibm_db-2.0.8-py3.6-macosx-10.6-intel.egg/clidriver/lib
It should work like a charm after this.
For reference checkout this:Issues with MAC OS X
I was having the same error and found that the installDSDriver script creates a file at /Applications/dsdriver/db2profile stating the below:
# NAME: db2profile
#
# FUNCTION: This script sets up a default database environment for
# Bourne shell or Korn shell users.
#
# This file is tuned for IBM Data Server Driver Package only.
#
# USAGE: . db2profile
# This script can either be invoked directly as above or
# it can be added to the user's .profile file so that the
# database environment is established during login.
#
so I just added on my ~/.bash_profile the line below:
source /Applications/dsdriver/db2profile
Open a new terminal window or restart and should work.
This file exports all the environment variables needed for the db2cli command to work.

Mac OS X 10.11 - Add Postgres to $PATH Unsuccessful

I am trying to install Postgres in order to use Heroku.
I am following the instructions in the Heroku tutorial, and after Postgres installation (which was successful), it says to configure my .bash_profile to allow for Postgres command line functionality.
I am following the instructions here, but I am unable to successfully add this line:
export PATH=$PATH:/Applications/Postgres.app/Contents/Versions/latest/bin
That folder does in fact contain "psql" on my computer, so it should work. Here is my current .bash_profile:
# Setting PATH for Python 2.7
# The orginal version is saved in .bash_profile.pysave
PATH="/Library/Frameworks/Python.framework/Versions/2.7/bin:${PATH}"
export PATH
# The next line updates PATH for the Google Cloud SDK.
source '/Users/user/google-cloud-sdk/path.bash.inc'
# The next line enables shell command completion for gcloud.
source '/Users/user/google-cloud-sdk/completion.bash.inc'
I tried to add the Postgres line to the end of that file, but it is not working. After searching online, there does not seem to be consensus on how to add PATHs to .bash_profile. I have tried many versions listed, but none have worked.
Please let me know if I'm doing this incorrectly!
Add this line to the end of your .bash_profile:
export PATH=/Applications/Postgres.app/Contents/Versions/latest/bin:$PATH
This makes the search of a binary to look in that location "BEFORE the rest of the PATH"
Kill all instances of terminal and open it again, then it should work.
Try which xxx where xxx is the name of some binary inside /Applications/Postgres.app/Contents/Versions/latest/bin and check if it's returning that location.
Tell me if it works.