connect Spyder IDE to remote Cloudera environment - pyspark

I have installed Anaconda3 (64 bit) in my local windows machine, where Spyder IDE is installed. I want to connect the Spyder IDE to the Cloudera Hadoop cluster, so that I can write my PYSPARK scripts in Spyder IDE for that remote Cloudera Hadoop cluster. What are the steps I have to follow to connect to the remote cluster & to execute pyspark code from my local windows machine.

Related

How to integrate Eclipse IDE with Databricks Cluster

I am trying to integrate my Scala Eclipse IDE with my Azure Databricks Cluster so that I can directly run my Spark program through Eclipse IDE on my Databricks Cluster.
I followed the official documentation of Databricks Connect(https://docs.databricks.com/dev-tools/databricks-connect.html)
.
I have:
Installed Anaconda.
Installed Python Lib 3.7 and Databricks Connect library 6.0.1.
Did the Databricks Connect Configuration part(CLI part).
Also, added the client libraries in the Eclipse IDE.
Set the SPARK_HOME env. variable to the path which I get from running command in Anaconda, i.e. 'databricks-connect get-jar-dir'
I have not set any other environment variables apart from the one mentioned above.
Need help on finding what else is to be done to accomplish this integration, like how the ENV. variable related to connection works if running through IDE.
If someone has already done this successfully, guide me please.

show zepplin gui from host

I installed centos 7 in vmware in my windows, in the new machine I INSTAlled zeppelin, I started the service, now I want to see gui of zeppelin, I typed http://198.168.250.128:8080 in chrome in windows and that didn't work. also gui pf installed spark in my centos didn't work http://198.168.250.128:4040,
I want to know if I forgot something

Using PostgreSQL installed in Windows inside WSL as well?

I have installed PostgreSQL version 10 in Windows. Since I'm doing Web development work and using WSL frequently I would like to use PostgreSQL from within WSL too.
Is there anyway in which we are able to link the PostgreSQL software installed in Windows with WSL environment so that any new databases created inside WSL is available in the Windows environment also?
1) PostgreSQL10 installed on Windows environment.
2) PostgresSQL9.5 installed inside WSL environment.
My Question is: how do I connect both Windows configurations WSL to use the same PostgreSQL instance?

Hadoop learning development workflow with Eclipse and AWS

I have Installed a single node hadoop 2.8 on AWS free tier nano instance. I have a local windows machine with eclipse on it. What is a good learning workflow. I am not sure of capabilities of AWS orhadoop. Should I write code in local eclipse build a jar, transfer it AWS machine and run it?
If I have to write and create a jar from local machine should I have hadoop installed? how should I do? and What is good learning path from installation to being comfortable with working on hadoop?

Spark setup on Windows

I am trying to setup Spark on my Windows 10 PC. After executing the spark-shell command, I got the following error:
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
at rg.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect
Spark Installation on windows machine is not much difficult. We need to take care of some permissions and configurations during the installation. Please follow below link for step wise Spark and Scala installation and configuration on windows machine.
Apache Spark Installation on windows10