My PySpark cluster is installed in one server say 10.45.25.30 and I wants to run my PySpark code in another server lets take 10.45.32.67 by connecting to the PySpark cluster installed in 10.45.25.30.
How can I to connect to the PySpark cluster that is being installed in another server from my current server?
Related
I am kind of a stuck in a situation where I want to use python script to connect to a VPN and then connect to the Mongo Client. Our company has given me a ovpn file which I connect to using OPENVPN connect client. After the VPN connection is made, I then connect to the mongo client/server with the pymongo module in python and everything works fine and i can run all the relevant scripts on the DB.
I want to ask if it is possible to NOT use the OPENVPN connect client(because this actually connects my whole internet to the VPN and that what i dont want) and somehow use the OVPN file in python script and connect to the mongo server. I heard the that we can use the subprocess module to connect using a OVPN file but dont know how.
I am on a windows computer if this information helps. I how i got the message across. Any help will be appreciated.
here is the script i wrote but it gave me a server timeout error.
import subprocess
import pymongo
subprocess.run(["openvpn", "--config", "test-vpn.ovpn" ], shell = True)
client = pymongo.MongoClient('CONNECTIONSTRING')
client.list_database_names()
I'm new to PySpark and I want to connect remote Hadoop Cluster (CDP) through Linux server by using spark-submit command.
Any help would be appreciated.
I need spark-submit command to connect remote CDP.
You can use Apache Livy to submit remote jobs to a CDP cluster. Here is detailed info on how to install and use Livy to submit jobs :
After downloading and unzipping Livy you should add following lines in livy.conf file. Then start livy service.
livy.spark.master = yarn
livy.spark.deploy-mode = cluster
You can find examples of how to create a spark submit script on following links:
https://community.cloudera.com/t5/Community-Articles/Submit-a-Spark-Job-to-CDP-Data-Hub-using-the-Livy-REST-API/ta-p/322481
https://livy.apache.org/examples/
I'm trying to get RabbitMQ to monitor a postgresql database to create a message queue when database rows are updated. The eventual plan is to feed this message queue into an AWS EKS (Elastic Kubernetes Service) cluster as a job.
I've read many many approaches to this but they are still confusing as a newcomer to RabbitMQ and many seemed to be written more than 5 years ago so I'm not sure if they'll still work with current versions of postgres and rabbitmq.
I've followed this guide about installing the area51/notify-rabbit docker container which can connect the two via a node app, but when I ran the docker container it immediately stopped and didn't seem to do anything.
There is also this guide, which uses a go app to connect the two, but I'd rather not use Go ouside of a docker container.
Additionally, there is also this method, to install the pg_amqp extension from a repository which hasn't been updated in years, which allows for a direct connection from PostgreSQL to RabbitMQ. However, when I followed this and attempted to install pg_amqp on my Postgres db (postgresql 12), I was unable to connect using psql to the database, getting the classic error:
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
My current set-up, is I have a rabbitMQ server installed in a docker container in an AWS EC2 instance which I can access via the internet. I ran the following to install and run it:
docker pull rabbitmq:3-management
docker run --rm -p 15672:15672 -p 5672:5672 rabbitmq:3-management
The postgresql database is running on a separate EC2 instance and both instances have the required ports open for accessing data from each server.
I have also looked into using Amazon SQS as well for this, but it didn't seem to have any info on linking Postgresql up to it. I haven't really seen any guides or Stack Overflow questions on this since 2017/18 so I'm wondering if this is still the best way to create a message broker for a kubernetes system? Any help/pointers on this much appreciated.
In the end, I decided the best thing to do was create some simple Python scripts to do the LISTEN/NOTIFY steps and route traffic from PostgreSQL to RabbitMQ based off the following code https://gist.github.com/kissgyorgy/beccba1291de962702ea9c237a900c79
I set it up inside Docker containers and set them to run in my Kubernetes cluster so they are within the automatic restarts if they fail.
I'm trying to implement a small API in docker and i need that API writes to a database which is hosted on the same server but running on windows server 2006
I can't change the OS in the server because that server also works as a gateway for Powerbi
Should i mount the volume to (I'm guessing C:/mongodb/data) or should i make the insert by the localhost
These are my limitations :
host: running windows server 2006 (cant change this)
app: a container running in windows subsystem for linux (it has to run on linux because i need async functions and i only have knoweldge in python|nodejs) but it has to persist the data in the mongo database running on host
mongo database : it has to be running in windows server because a Power Bi Gateway is running comsuming data
keeping with diagrams maybe this will help to explain this in a better way
As far as I understand your system is as in the picture. You want to write data to MongoDB. There should be a network bridge to connect between the host and the Linux environment. You can access MongoDB via bridge IP. If you want to run another MongoDB and mount disk where is in the host. It is not reliable because the data may conflict.
Well, my previous attempt to connect Pycharm from laptop to remote server did not see any ray of hope because of tcp/ip issues (which honestly I could not understand much and am still battling with) therefore I am looking to ipython as an alternative
Question: How can I configure ipython on laptop to point to the remote CentOS6 server for code processing and execution
Use Case: I want to use my laptop (using Win 7 Professional) to connect to the CentOS 6.4 master server using iPython.
Objective: To write the code in iPython on the laptop and then send the job to the server which will do the processing and should then return the result back to the laptop or to any other visualizing API.
The server and 3 namenodes already installed with pyspark and I have checked pyspark works in standalone mode on all four servers. Pyspark works in standalone mode on my laptop too.
Current setup: I use SSH to access the server. python 2.6 is installed on the server and the nodes. Able to run pyspark on all 4 servers in standalone mode
Any pointers will be helpful.
You have to start the IPython Notebook server on one cluster-node and then connect to the cluster-node's url. To do this, you have to create a profile, where you may specify hostname, port and so on. Have a look at this:
http://ipython.org/ipython-doc/1/interactive/public_server.html