I'm running following pyspark code with connection to mongodb
sparkConf = SparkConf().setMaster("local").setAppName("MongoSparkConnectorTour").set("spark.app.id", "MongoSparkConnectorTour")
# If executed via pyspark, sc is already instantiated
sc = SparkContext(conf=sparkConf)
sqlContext = SQLContext(sc)
# create and load dataframe from MongoDB URI
df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource")\
.option("spark.mongodb.input.uri", config.MONGO_URL_AUTH + "/spark.times")\
.load()
inside Docker image with
CMD [ "spark-submit" \
, "--conf", "spark.mongodb.input.uri=mongodb://root:example#mongodb:27017/spark.times" \
, "--conf", "spark.mongodb.output.uri=mongodb://root:example#mongodb:27017/spark.output" \
, "--packages", "org.mongodb.spark:mongo-spark-connector_2.11:2.4.1" \
, "./spark.py" ]
config.MONGO_URL_AUTH is mongodb://root:example#mongodb:27017
but I'm getting exception on run:
db_1 | 2019-10-09T13:44:34.354+0000 I ACCESS [conn4] Supported SASL mechanisms requested for unknown user 'root#spark'
db_1 | 2019-10-09T13:44:34.378+0000 I ACCESS [conn4] SASL SCRAM-SHA-1 authentication failed for root on spark from client 172.22.0.4:49302 ; UserNotFound: Could not find user "root" for db "spark"
pyspark_1 | Traceback (most recent call last):
pyspark_1 | File "/home/ubuntu/./spark.py", line 35, in <module>
pyspark_1 | .option("spark.mongodb.input.uri", config.MONGO_URL_AUTH + "/spark.times")\
pyspark_1 | File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/sql/readwriter.py", line 172, in load
pyspark_1 | return self._df(self._jreader.load())
pyspark_1 | File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
pyspark_1 | File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
pyspark_1 | return f(*a, **kw)
pyspark_1 | File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
pyspark_1 | py4j.protocol.Py4JJavaError: An error occurred while calling o34.load.
pyspark_1 | : com.mongodb.MongoSecurityException: Exception authenticating MongoCredential{mechanism=SCRAM-SHA-1, userName='root', source='spark', password=<hidden>, mechanismProperties={}}
pyspark_1 | at com.mongodb.internal.connection.SaslAuthenticator.wrapException(SaslAuthenticator.java:173)
everything in this setup works flawlessly if I don't use user and password in mongodb docker and just connect with mongodb://mongodb:27017 address, and just with using pymongo I can connect with password, something is wrong with my spark to mongodb configuration when password is used and I can't understand what is wrong.
setup for mongodb (part of docker-compose file):
db:
image: mongo
restart: always
networks:
miasnet:
aliases:
- "miasdb"
environment:
MONGO_INITDB_ROOT_USERNAME: root
MONGO_INITDB_ROOT_PASSWORD: example
MONGO_INITDB_DATABASE: spark
ports:
- "27017:27017"
volumes:
- /data/db:/data/db
https://hub.docker.com/_/mongo reads:
MONGO_INITDB_ROOT_USERNAME, MONGO_INITDB_ROOT_PASSWORD
These variables, used in conjunction, create a new user and set that user's password. This user is created in the admin authentication database and given the role of root, which is a "superuser" role.
You don't specify the authentication database, so mongo uses current database by default - spark in your case.
You need to specify "admin" auth database in the connection string:
spark.mongodb.input.uri=mongodb://root:example#mongodb:27017/spark.times?authSource=admin
spark.mongodb.output.uri=mongodb://root:example#mongodb:27017/spark.output?authSource=admin
Related
I'm having trouble working my code in docker. Could you please help me?
I'm putting my application in docker together with mongo in docker, but when I run the file inside docker it doesn't connect with the mongo of the other docker and accuses Timeout.
My Docker:
version: "3.4"
services:
mongo_db:
image: mongo:6.0
ports:
- "27017:27017"
volumes:
- ./mongo_db:/data/db
container_name: mongo_db
mongo_app:
image: mongo_img_db:latest
links:
- mongo_db
command: python3 /app/main.py
container_name: mongo_app
Error:
### Connection: Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'app')
Traceback (most recent call last):
File "/app/main.py", line 216, in <module>
db_calc.update_storage_stats(db_calc.calculate_artifacts_size())
File "/app/main.py", line 43, in calculate_artifacts_size
artifacts_size = self.db_client.builds.aggregate(
File "/usr/local/lib/python3.9/site-packages/pymongo/collection.py", line 2428, in aggregate
with self.__database.client._tmp_session(session, close=False) as s:
File "/usr/local/lib/python3.9/contextlib.py", line 119, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1757, in _tmp_session
s = self._ensure_session(session)
File "/usr/local/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1740, in _ensure_session
return self.__start_session(True, causal_consistency=False)
File "/usr/local/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1685, in __start_session
self._topology._check_implicit_session_support()
File "/usr/local/lib/python3.9/site-packages/pymongo/topology.py", line 538, in _check_implicit_session_support
self._check_session_support()
File "/usr/local/lib/python3.9/site-packages/pymongo/topology.py", line 554, in _check_session_support
self._select_servers_loop(
File "/usr/local/lib/python3.9/site-packages/pymongo/topology.py", line 238, in _select_servers_loop
raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: localhost:27017: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 63542415a868649d12f2a966, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused')>]>
I've been using Apache Spark(pyspark) to read from MongoDB Atlas, I've a shared(free) cluster - which has a limit of 512 MB storage
I'm trying to migrate to serverless, but somehow unable to connect to the serverless instance - error
pyspark.sql.utils.IllegalArgumentException: requirement failed: Invalid uri: 'mongodb+srv://vani:<password>#versa-serverless.w9yss.mongodb.net/versa?retryWrites=true&w=majority'
Pls note :
I'm able to connect to the instance using pymongo, but not using pyspark.
Here is the pyspark code (Not Working):
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("MongoDB operations").getOrCreate()
print(" spark ", spark)
# cluster0 - is the free version, and i'm able to connect to this
# mongoConnUri = "mongodb+srv://vani:password#cluster0.w9yss.mongodb.net/?retryWrites=true&w=majority"
mongoConnUri = "mongodb+srv://vani:password#versa-serverless.w9yss.mongodb.net/?retryWrites=true&w=majority"
mongoDB = "versa"
collection = "name_map_unique_ip"
df = spark.read\
.format("mongo") \
.option("uri", mongoConnUri) \
.option("database", mongoDB) \
.option("collection", collection) \
.load()
Error :
22/07/26 12:25:36 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
22/07/26 12:25:36 INFO SharedState: Warehouse path is 'file:/Users/karanalang/PycharmProjects/Versa-composer-mongo/composer_dags/spark-warehouse'.
spark <pyspark.sql.session.SparkSession object at 0x7fa1d8b9d5e0>
Traceback (most recent call last):
File "/Users/karanalang/PycharmProjects/Kafka/python_mongo/StructuredStream_readFromMongoServerless.py", line 30, in <module>
df = spark.read\
File "/Users/karanalang/Documents/Technology/spark-3.2.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 164, in load
File "/Users/karanalang/Documents/Technology/spark-3.2.0-bin-hadoop3.2/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line 1309, in __call__
File "/Users/karanalang/Documents/Technology/spark-3.2.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 117, in deco
pyspark.sql.utils.IllegalArgumentException: requirement failed: Invalid uri: 'mongodb+srv://vani:password#versa-serverless.w9yss.mongodb.net/?retryWrites=true&w=majority'
22/07/26 12:25:36 INFO SparkContext: Invoking stop() from shutdown hook
22/07/26 12:25:36 INFO SparkUI: Stopped Spark web UI at http://10.42.28.205:4040
pymongo code (am able to connect using the same uri):
from pymongo import MongoClient, ReturnDocument
# from multitenant_management import models
client = MongoClient("mongodb+srv://vani:password#versa-serverless.w9yss.mongodb.net/vani?retryWrites=true&w=majority")
print(client)
all_dbs = client.list_database_names()
print(f"all_dbs : {all_dbs}")
any ideas how to debug/fix this ?
tia!
I'm having trouble getting flask to connect to the mongo database. I was able to connect to the base via Pycharm - Database, additionally I downloaded MongoDB Compass to check if I can also connect from there and I did it. I installed the mongo server in containers, not locally.
This is my docker and docker-compose:
dockerfile:
FROM python:3.10
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONBUFFERED 1
WORKDIR /code
COPY Pipfile Pipfile.lock /code/
RUN pip install pipenv && pipenv install --system
CMD python main.py
docker-compose.yaml:
version: "3.8"
services:
db_mongo:
image: mongo:5.0
container_name: mongo
ports:
- "27018:27017"
volumes:
- ./init-mongo.js:/docker-entrypoint-initdb.d/init-mongo.js:ro
- ./mongo-volume:/data/db
env_file:
- ./env/.env_database
backend:
build: .
volumes:
- .:/code/
ports:
- '8100:5000'
container_name: flask_api_container
depends_on:
- db_mongo
init-mongo.js:
db.log.insertOne({"message": "Database created."});
db = db.getSiblingDB('admin');
db.auth('mk', 'adminPass')
db = db.getSiblingDB('flask_api_db');
db.createUser(
{
user: 'flask_api_user',
pwd: 'dbPass',
roles: [
{
role: 'dbOwner',
db: 'flask_api_db'
}
]
}
)
db.createCollection('collection_test');
I copied the uri address from MongoDB Compass (I connected to the base with this address). I have tried various combinations of this address.
main.py:
import pymongo
from flask import Flask, Response, jsonify
from flask_pymongo import PyMongo
app = Flask(__name__)
uri = 'mongodb://flask_api_user:dbPass#db_mongo:27018/?authMechanism=DEFAULT&authSource=flask_api_db'
client = pymongo.MongoClient(uri)
client.admin.command('ismaster') # to check if the connection has been established - show errors in terminal
# try:
# mongodb_client = PyMongo(
# app=app,
# uri='mongodb://flask_api_user:dbPass#db_mongo:27018/?authMechanism=DEFAULT&authSource=flask_api_db'
# # uri='mongodb://flask_api_user:dbPass#localhost:27018/?authMechanism=DEFAULT&authSource=flask_api_db'
# )
# db = mongodb_client.db
# print('DB: ', db, flush=True)
# # client.admin.command('ismaster')
# # print('OK', flush=True)
# except:
# print('Error', flush=True)
#app.route('/')
def index():
return 'Test2'
# #app.route("/add_one/")
# def add_one():
# db.my_collection.insert_one({'title': "todo title", 'body': "todo body"})
# return jsonify(message="success")
if __name__ == '__main__':
app.run(debug=True, host='0.0.0.0', port=8100)
Errors:
when I did this, it returned None:
# db = mongodb_client.db
# print('DB: ', db, flush=True)
below errors from this method:
uri = 'mongodb://flask_api_user:dbPass#db_mongo:27018/?authMechanism=DEFAULT&authSource=flask_api_db'
client = pymongo.MongoClient(uri)
client.admin.command('ismaster') # to check if the connection has been established - show errors in terminal
flask_api_container | Traceback (most recent call last):
flask_api_container | File "/code/main.py", line 9, in <module>
flask_api_container | client.admin.command('ismaster')
flask_api_container | File "/usr/local/lib/python3.10/site-packages/pymongo/database.py", line 721, in command
flask_api_container | with self.__client._socket_for_reads(read_preference, session) as (
flask_api_container | File "/usr/local/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1235, in _socket_for_reads
flask_api_container | server = self._select_server(read_preference, session)
flask_api_container | File "/usr/local/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1196, in _select_server
flask_api_container | server = topology.select_server(server_selector)
flask_api_container | File "/usr/local/lib/python3.10/site-packages/pymongo/topology.py", line 251, in select_server
flask_api_container | servers = self.select_servers(selector, server_selection_timeout, address)
flask_api_container | File "/usr/local/lib/python3.10/site-packages/pymongo/topology.py", line 212, in select_servers
flask_api_container | server_descriptions = self._select_servers_loop(selector, server_timeout, address)
flask_api_container | File "/usr/local/lib/python3.10/site-packages/pymongo/topology.py", line 227, in _select_servers_loop
flask_api_container | raise ServerSelectionTimeoutError(
flask_api_container | pymongo.errors.ServerSelectionTimeoutError: db_mongo:27018: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 62a9baa2cc5e740091e0a60b, topology_type: Unknown, servers: [<ServerDescription ('db_mongo', 27018) server_type: Unknown, rtt: None, error=AutoReconnect('db_mongo:27018: [Errno 111] Connection refused')>]>
flask_api_container exited with code 1
None of the uri used worked.
Thanks for any help in resolving this issue.
main.py
MongoDB Compass
PyCharm - database
main.py 2
The mapped port (27018) is used when you connect from the host to the database. Containers on the same bridge network as the database connect directly to the container and should use the container port (27017).
So your connection string should be
uri = 'mongodb://flask_api_user:dbPass#db_mongo:27017/?authMechanism=DEFAULT&authSource=flask_api_db'
I am trying to create a JupyterHub that uses an LDAP to authenticate users.
The JupyterHub is working but when I am trying log on the web page show a Error 500.
I must clarify that when I tried it with the PAM method it worked without problems, but when configuring it to use the LDAP it started to fail.
I am using docker-compose. I currently have three containers:
1. JupyterHub:
jupyter:
build:
context: ./jupyterhub
ports:
- "8380:8000"
environment:
VIRTUAL_HOST: jupyter.probandofran.eu
LETSENCRYPT_HOST: jupyter.probandofran.eu
VIRTUAL_PORT: 8000
restart: on-failure
depends_on:
- jupyterlab
- revproxy-letsencrypt
volumes:
- ${VOLUMES_BASE_PATH}/jupyter:/home
- /var/run/docker.sock:/var/run/docker.sock:ro
healthcheck:
test: curl --fail -s http://jupyter:8000/ || exit 1
interval: 10s # time between tests
timeout: 5s # time waiting for a response
start_period: 30s # time from launch when the failed tests are ignored
retries: 5 # countdown of failed tests
2.Jupyterlab:
jupyterlab:
build:
context: ./jupyterlab
image: ngd-jupyterlab
command: echo
healthcheck:
test: curl --fail -s http://jupyterlab:8080/ || exit 1
interval: 10s # time between tests
timeout: 5s # time waiting for a response
start_period: 30s # time from launch when the failed tests are ignored
retries: 5 # countdown of failed tests
3. OpenLDAP
openldap:
image: docker.io/bitnami/openldap:2.6
ports:
- '1389:1389'
- '1636:1636'
environment:
- LDAP_ENABLE_TLS=no
- LDAP_ADMIN_USERNAME=admin
- LDAP_ADMIN_PASSWORD=adminpassword
- LDAP_USERS=user1,user02
- LDAP_PASSWORDS=user1,password2
volumes:
- 'openldap_data:/bitnami/openldap'
I think the problem is in the JupyterHub configuration file
jupyterhub_config.py
from jupyter_client.localinterfaces import public_ips
ip = public_ips()[0]
c.Spawner.default_url = '/lab'
c.Authenticator.admin_users = {'fran'}
c.JupyterHub.admin_access = False
in_docker_compose = True # "False" for standalone testing (for instance, to check changes to Jupyterlab image)
c.JupyterHub.hub_ip = ip if not in_docker_compose else '0.0.0.0'
# 'jupyter' is the name of Jupyterhub service in "docker-compose" file
c.JupyterHub.hub_connect_ip = '' if not in_docker_compose else 'jupyter'
if in_docker_compose:
c.DockerSpawner.network_name = 'webproxy' # Should match the network name used in the docker compose file
c.JupyterHub.spawner_class = 'dockerspawner.DockerSpawner'
c.DockerSpawner.image = 'ngd-jupyterlab' # 'jupyter/datascience-notebook:r-4.0.3'
notebook_dir = "/home/jovyan/"
c.DockerSpawner.notebook_dir = notebook_dir
c.DockerSpawner.volumes = {'jupyterhub-user-{username}': dict(bind=notebook_dir, mode="rw")}
c.DockerSpawner.use_internal_ip = True
c.DockerSpawner.remove_containers = True
c.DockerSpawner.remove = True
# c.DockerSpawner.extra_host_config = { 'network_mode': network_name }
c.JupyterHub.authenticator_class = 'ldapauthenticator.LDAPAuthenticator'
c.LDAPAuthenticator.lookup_dn = False
c.LDAPAuthenticator.bind_dn_template = [
"uid={username},ou=people,dc=wikimedia,dc=org",
"cn={username},ou=users,dc=example,dc=org"
]
# c.JupyterHub.allow_named_servers = False
# c.JupyterHub.authenticator_class = 'jupyterhub.auth.PAMAuthenticator'
c.JupyterHub.cleanup_proxy = True
c.JupyterHub.cleanup_servers = True
c.LDAPAuthenticator.server_use_ssl = False
c.LDAPAuthenticator.use_ssl = False
c.LDAPAuthenticator.server_address = 'openldap'
c.LDAPAuthenticator.server_port = 1389
c.LDAPAuthenticator.user_search_base = 'dc=example,dc=org'
# c.JupyterHub.reset_db = True
When I use the docker-compose up it show:
jupyter_1 | [E 2022-04-20 11:57:07.496 JupyterHub web:1789] Uncaught exception POST /hub/login?next=%2Fhub%2F (192.168.112.1)
jupyter_1 | HTTPServerRequest(protocol='http', host='jupyter.probandofran.eu', method='POST', uri='/hub/login?next=%2Fhub%2F', version='HTTP/1.1', remote_ip='192.168.112.1')
jupyter_1 | Traceback (most recent call last):
jupyter_1 | File "/usr/local/lib/python3.8/dist-packages/tornado/web.py", line 1704, in _execute
jupyter_1 | result = await result
jupyter_1 | File "/usr/local/lib/python3.8/dist-packages/jupyterhub/handlers/login.py", line 151, in post
jupyter_1 | user = await self.login_user(data)
jupyter_1 | File "/usr/local/lib/python3.8/dist-packages/jupyterhub/handlers/base.py", line 804, in login_user
jupyter_1 | authenticated = await self.authenticate(data)
jupyter_1 | File "/usr/local/lib/python3.8/dist-packages/jupyterhub/auth.py", line 473, in get_authenticated_user
jupyter_1 | authenticated = await maybe_future(self.authenticate(handler, data))
jupyter_1 | File "/usr/local/lib/python3.8/dist-packages/ldapauthenticator/ldapauthenticator.py", line 382, in authenticate
jupyter_1 | conn = self.get_connection(userdn, password)
jupyter_1 | File "/usr/local/lib/python3.8/dist-packages/ldapauthenticator/ldapauthenticator.py", line 314, in get_connection
jupyter_1 | conn = ldap3.Connection(
jupyter_1 | File "/usr/local/lib/python3.8/dist-packages/ldap3/core/connection.py", line 355, in __init__
jupyter_1 | self.do_auto_bind()
jupyter_1 | File "/usr/local/lib/python3.8/dist-packages/ldap3/core/connection.py", line 374, in do_auto_bind
jupyter_1 | self.start_tls(read_server_info=False)
jupyter_1 | File "/usr/local/lib/python3.8/dist-packages/ldap3/core/connection.py", line 1264, in start_tls
jupyter_1 | if self.server.tls.start_tls(self) and self.strategy.sync: # for asynchronous connections _start_tls is run by the strategy
jupyter_1 | File "/usr/local/lib/python3.8/dist-packages/ldap3/core/tls.py", line 277, in start_tls
jupyter_1 | raise LDAPStartTLSError(connection.last_error)
jupyter_1 | ldap3.core.exceptions.LDAPStartTLSError: startTLS failed - protocolError
jupyter_1 |
It's my first time posting on stackOverflow, I'm sorry if I made any mistake posting.
I think it is bug from ldapauthenticator, we can hot fix by change the logic inside this lib (file jupyterhub_config.py)
As #jnishii suggestion in https://github.com/jupyterhub/ldapauthenticator/issues/211
I am running into an issue with MySQL python connection.
Traceback (most recent call last):
File "./expconfig.py", line 176, in <module>
cnx = mysql.connector.connect(**config)
File "/usr/lib/python2.6/site-packages/mysql/connector/__init__.py", line 179, in connect
return MySQLConnection(*args, **kwargs)
File "/usr/lib/python2.6/site-packages/mysql/connector/connection.py", line 95, in __init__
self.connect(**kwargs)
File "/usr/lib/python2.6/site-packages/mysql/connector/abstracts.py", line 728, in connect
self._open_connection()
File "/usr/lib/python2.6/site-packages/mysql/connector/connection.py", line 228, in _open_connection
self._ssl)
File "/usr/lib/python2.6/site-packages/mysql/connector/connection.py", line 150, in _do_auth
ssl_options.get('cipher'))
File "/usr/lib/python2.6/site-packages/mysql/connector/network.py", line 420, in switch_to_ssl
ssl_version=ssl.PROTOCOL_TLSv1, ciphers=cipher)
TypeError: wrap_socket() got an unexpected keyword argument 'ciphers'
MySQL server has "ssl_diabled" , so the client doesn't need SSL connection. However it invokes
I have following,
Python : 2.6.6 MySql Python connector mysql-connector-python-2.1.7
OS : RHEL 6.6
MySQL server Server version : 5.7.17-enterprise-commercial-advanced
Code
try:
flags = [ClientFlag.FOUND_ROWS,-ClientFlag.SSL]
config = {
'user' : 'ed30_user',
'password' : 'mypassword',
'host' : options.remHost,
'database' : 'config',
'client_flags': [-ClientFlag.SSL],
'ssl_disabled' : False
}
cnx = mysql.connector.connect(**config)
cur=cnx.cursor(dictionary=True)
The reason for the error was that Pythong 2.6.6 has "ssl" module with different method signature than expected by the MysqlConnector version 2.1.7.
The MySQL web https://dev.mysql.com/doc/connector-python/en/connector-python-versions.html had in correct version table
I dropped back to connector version mysql-connector-python-2.0.5-1.el6.noarch.rpm and the error went away.
just set 'ssl_disabled' : True if using config file
or ssl_disabled='True' if your config is in arguments to mysql.connector.connect function