AWS Glue ETL MongoDB Connection String Error - mongodb

Issue using MongoDb with AWS glue - I've created a connection to the database (using the MongoDb connection option) and run a crawler against it and it all worked fine, but when I try to use this as a datasource in a basic ETL job (script- Glue version 2.0Python version 3) it throws the exception
py4j.protocol.Py4JJavaError: An error occurred while calling o70.getDynamicFrame.
: java.lang.RuntimeException: Mongo/DocumentDB connection URL is not supported
Has anyone had any success using MongoDb as a datasource in glue ETL jobs?

Related

Snowflake pyspark connector exception net.snowflake.client.jdbc.SnowflakeSQLException

I am facing below exception while trying to connect to snowflake to pyspark:
py4j.protocol.Py4JJavaError: An error occurred while calling o117.load.
: net.snowflake.client.jdbc.SnowflakeSQLException: !200051!
at net.snowflake.client.core.SFBaseSession.getHttpClientKey(SFBaseSession.java:321)
at net.snowflake.client.core.SFSession.open(SFSession.java:408)
at net.snowflake.client.jdbc.DefaultSFConnectionHandler.initialize(DefaultSFConnectionHandler.java:104)
at net.snowflake.client.jdbc.DefaultSFConnectionHandler.initializeConnection(DefaultSFConnectionHandler.java:79)
at net.snowflake.client.jdbc.SnowflakeConnectionV1.initConnectionWithImpl(SnowflakeConnectionV1.java:116)
at net.snowflake.client.jdbc.SnowflakeConnectionV1.<init>(SnowflakeConnectionV1.java:96)
at net.snowflake.client.jdbc.SnowflakeDriver.connect(SnowflakeDriver.java:172)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at net.snowflake.spark.snowflake.JDBCWrapper.getConnector(SnowflakeJDBCWrapper.scala:209)
It looks like you are behind a firewall or a proxy server. I suggest using the Snowflake connectivity diagnostic tool SnowCD to make sure that all Snowflake URLs are reachable. If you see any errors, then you might want to check your firewall configuration or add a proxy configuration to spark the connection.

IBM DB2 SQL Connection Errors

I can connect to IBM DB2 inside the IBM Cloud Pak for Data, but when I try to run the exact same %sql connection it errors out. What am I missing?
'''%sql ibm_db_sa://un:pw#host:port/db?security=SSL'''
(ibm_db_dbi.Error) ibm_db_dbi::Error: [IBM][CLI Driver] SQL5005C The operation failed because the database manager failed to access either the database manager configuration file or the database configuration file.\r SQLCODE=-5005
(Background on this error at: http://sqlalche.me/e/dbapi)
Connection info needed in SQLAlchemy format, example:
postgresql://username:password#hostname/dbname
or an existing connection: dict_keys([])
IBM DB2 SQL
Try loading the package ibm_db

Spark job dataframe write to Oracle using jdbc failing

When writing spark dataframe to Oracle database (Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit), the spark job is failing with the exception java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection. The scala code is
dataFrame.write.mode(SaveMode.Append).jdbc("jdbc:oracle:thin:#" + ipPort + ":" + sid, table, props)
Already tried setting below properties for jdbc connection but hasn't worked.
props.put("driver", "oracle.jdbc.OracleDriver")
props.setProperty("testOnBorrow","true")
props.setProperty("testOnReturn","false")
props.setProperty("testWhileIdle","false")
props.setProperty("validationQuery","SELECT 1 FROM DUAL")
props.setProperty("autoReconnect", "true")
Based on the earlier search results, it seems that the connection is opened initially but is being killed by the firewall after some idle time. The connection URL is verified and is working as the select queries work fine. Need help in getting this resolved.

Streamsets DC and Crate exception. ERROR: SQLParseException: line 1:13: no viable alternative at input 'CHARACTERISTICS'

I am trying to connect to Crate as a Streamsets Data collector pipeline origin ( JDBC Consumer ). However I get this error: "JDBC_00 - Cannot connect to specified database: com.streamsets.pipeline.api.StageException: JDBC_06 - Failed to initialize connection pool: com.zaxxer.hikari.pool.PoolInitializationException: Exception during pool initialization: ERROR: SQLParseException: line 1:13: no viable alternative at input 'CHARACTERISTICS' "
Why am I getting this error ? The Crate JDBC Driver version is 2.1.5 and Streamsets Data collector version is 2.4.0.0.
#gashey already solved the issue. Within Streamsets DC uncheck Enforce Read-only Connection on the Advanced tab of my JDBC query consumer configuration
(see https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/crateio/hBexxel2KQw/kU34mrsJBgAJ).
We will update the streamsets-documentation with the workaround. https://crate.io/docs/tools/streamsets/

MongoInternalException: DBPort.findOne failed while running on GAE localserver

I am trying to connect to remote MongoDB (mongolab) from my local GAE server (localhost/8888). I am using morphia and my mongodb driver version is 2.4. My code looks like this:
Mongo m = new Mongo("xyz.mongolab.com",);
Datastore datastore = new Morphia().createDatastore(m, "staging","uname","password".toCharArray());
This throws the following exception :
com.mongodb.MongoInternalException: DBPort.findOne failed
at com.mongodb.DBPort.findOne(DBPort.java:153)
at com.mongodb.DBPort.runCommand(DBPort.java:159)
at com.mongodb.DBTCPConnector.testMaster(DBTCPConnector.java:371)
at com.mongodb.Mongo.(Mongo.java:167)
Caused by: java.io.IOException: couldn't connect to [xyz.mongolab.com/:] bc:java.net.SocketException: Operation failure: setSocketOptions: Not yet implemented
at com.mongodb.DBPort._open(DBPort.java:205)
Does somebody know why this is happening ?
it was a problem with using the old mongodb driver.. works after i upgraded..