ServerSelectionTimeoutError: Time out error when connecting to Atlas via pymongo - mongodb

I'm trying to connect to my Atlas mongodb database through pymongo. I make the connection and do a basic query just to count the documents and it times out.
I am able to run the same string on my personal linux (and managed to get it working from a clean Docker), but did not manage to get it working from my Mac that I use for work (and neither did my colleagues and I wasn't able to make it work in a clean Docker image either). If it matters I'm running pymongo 3.8, that I installed with pip install pymongo[tls]. I tried downgrading as well and tried also pip install pymongo[tls,srv].
Personal guesses: something to do with proxy/firewall blocking the connection maybe? I checked whether the port was open On the server I whitelisted the 0.0.0.0/0 so that shouldn't be the issue.
import pymongo
client = pymongo.MongoClient("mongodb+srv://whatever:yep#cluster0-xxxxx.mongodb.net/test?retryWrites=true")
client.test.matches.count_documents({}) # this blocks and then errors
I get the following error
/usr/local/lib/python3.7/site-packages/pymongo/collection.py in count_documents(self, filter, session, **kwargs)
1693 collation = validate_collation_or_none(kwargs.pop('collation', None))
1694 cmd.update(kwargs)
-> 1695 with self._socket_for_reads(session) as (sock_info, slave_ok):
1696 result = self._aggregate_one_result(
1697 sock_info, slave_ok, cmd, collation, session)
/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py in __enter__(self)
110 del self.args, self.kwds, self.func
111 try:
--> 112 return next(self.gen)
113 except StopIteration:
114 raise RuntimeError("generator didn't yield") from None
/usr/local/lib/python3.7/site-packages/pymongo/mongo_client.py in _socket_for_reads(self, read_preference)
1133 topology = self._get_topology()
1134 single = topology.description.topology_type == TOPOLOGY_TYPE.Single
-> 1135 server = topology.select_server(read_preference)
1136
1137 with self._get_socket(server) as sock_info:
/usr/local/lib/python3.7/site-packages/pymongo/topology.py in select_server(self, selector, server_selection_timeout, address)
224 return random.choice(self.select_servers(selector,
225 server_selection_timeout,
--> 226 address))
227
228 def select_server_by_address(self, address,
/usr/local/lib/python3.7/site-packages/pymongo/topology.py in select_servers(self, selector, server_selection_timeout, address)
182 with self._lock:
183 server_descriptions = self._select_servers_loop(
--> 184 selector, server_timeout, address)
185
186 return [self.get_server_by_address(sd.address)
/usr/local/lib/python3.7/site-packages/pymongo/topology.py in _select_servers_loop(self, selector, timeout, address)
198 if timeout == 0 or now > end_time:
199 raise ServerSelectionTimeoutError(
--> 200 self._error_message(selector))
201
202 self._ensure_opened()
ServerSelectionTimeoutError: cluster0-shard-00-01-eflth.mongodb.net:27017: timed out,cluster0-shard-00-00-eflth.mongodb.net:27017: timed out,cluster0-shard-00-02-eflth.mongodb.net:27017: timed out

Related

Why pip install not working in Jupyter notebook?

When i run pip3 install <package> or !pip3 install <package> or !pip install <package> i get this error. And also i can't clone any repo in jupyter. It gives the same error. This is my first time in Jupyter.
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 get_ipython().run_line_magic('pip', 'install boto3')
File /lib/python3.9/site-packages/IPython/core/interactiveshell.py:2294, in InteractiveShell.run_line_magic(self, magic_name, line, _stack_depth)
2292 kwargs['local_ns'] = self.get_local_scope(stack_depth)
2293 with self.builtin_trap:
-> 2294 result = fn(*args, **kwargs)
2295 return result
File /lib/python3.9/site-packages/IPython/core/magics/packaging.py:75, in PackagingMagics.pip(self, line)
72 else:
73 python = shlex.quote(python)
---> 75 self.shell.system(" ".join([python, "-m", "pip", line]))
77 print("Note: you may need to restart the kernel to use updated packages.")
File /lib/python3.9/site-packages/IPython/core/interactiveshell.py:2451, in InteractiveShell.system_piped(self, cmd)
2446 raise OSError("Background processes not supported.")
2448 # we explicitly do NOT return the subprocess status code, because
2449 # a non-None value would trigger :func:`sys.displayhook` calls.
2450 # Instead, we store the exit_code in user_ns.
-> 2451 self.user_ns['_exit_code'] = system(self.var_expand(cmd, depth=1))
File /lib/python3.9/site-packages/IPython/utils/_process_posix.py:148, in ProcessHandler.system(self, cmd)
146 child = pexpect.spawnb(self.sh, args=['-c', cmd]) # Pexpect-U
147 else:
--> 148 child = pexpect.spawn(self.sh, args=['-c', cmd]) # Vanilla Pexpect
149 flush = sys.stdout.flush
150 while True:
151 # res is the index of the pattern that caused the match, so we
152 # know whether we've finished (if we matched EOF) or not
File /lib/python3.9/site-packages/IPython/utils/_process_posix.py:57, in ProcessHandler.sh(self)
55 self._sh = pexpect.which(shell_name)
56 if self._sh is None:
---> 57 raise OSError('"{}" shell not found'.format(shell_name))
59 return self._sh
I searched everywhere, but it is weird that no-one faced this issue except me. Pls provide some solution for this. I'm getting crazy.

MongoClient insert_one works while mongoengine connect doesn't (unautherized)

I try to insert document using mongoengine interface AFTER authentication, but still gets denied. This doesn't happen using MongoClient...
This is the mongoengine try to insert one document:
In [1]: from mongoengine import connect
In [2]: db = connect(host='localhost', port=27017, username='root', password='pass')
In [3]: db.local.col.insert_one({'a':1})
---------------------------------------------------------------------------
OperationFailure Traceback (most recent call last)
<ipython-input-3-55a23806fbb1> in <module>
----> 1 db.local.col.insert_one({'a':1})
~/venv3.8/lib/python3.8/site-packages/pymongo/collection.py in insert_one(self, document, bypass_document_validation, session)
696 write_concern = self._write_concern_for(session)
697 return InsertOneResult(
--> 698 self._insert(document,
699 write_concern=write_concern,
700 bypass_doc_val=bypass_document_validation,
~/venv3.8/lib/python3.8/site-packages/pymongo/collection.py in _insert(self, docs, ordered, check_keys, manipulate, write_concern, op_id, bypass_doc_val, session)
611 """Internal insert helper."""
612 if isinstance(docs, abc.Mapping):
--> 613 return self._insert_one(
614 docs, ordered, check_keys, manipulate, write_concern, op_id,
615 bypass_doc_val, session)
~/venv3.8/lib/python3.8/site-packages/pymongo/collection.py in _insert_one(self, doc, ordered, check_keys, manipulate, write_concern, op_id, bypass_doc_val, session)
600 _check_write_command_response(result)
601
--> 602 self.__database.client._retryable_write(
603 acknowledged, _insert_command, session)
604
~/venv3.8/lib/python3.8/site-packages/pymongo/mongo_client.py in _retryable_write(self, retryable, func, session)
1496 """Internal retryable write helper."""
1497 with self._tmp_session(session) as s:
-> 1498 return self._retry_with_session(retryable, func, s, None)
1499
1500 def _handle_getlasterror(self, address, error_msg):
~/venv3.8/lib/python3.8/site-packages/pymongo/mongo_client.py in _retry_with_session(self, retryable, func, session, bulk)
1382 retryable = (retryable and self.retry_writes
1383 and session and not session.in_transaction)
-> 1384 return self._retry_internal(retryable, func, session, bulk)
1385
1386 def _retry_internal(self, retryable, func, session, bulk):
~/venv3.8/lib/python3.8/site-packages/pymongo/mongo_client.py in _retry_internal(self, retryable, func, session, bulk)
1414 raise last_error
1415 retryable = False
-> 1416 return func(session, sock_info, retryable)
1417 except ServerSelectionTimeoutError:
1418 if is_retrying():
~/venv3.8/lib/python3.8/site-packages/pymongo/collection.py in _insert_command(session, sock_info, retryable_write)
588 command['bypassDocumentValidation'] = True
589
--> 590 result = sock_info.command(
591 self.__database.name,
592 command,
~/venv3.8/lib/python3.8/site-packages/pymongo/pool.py in command(self, dbname, spec, slave_ok, read_preference, codec_options, check, allowable_errors, check_keys, read_concern, write_concern, parse_write_concern_error, collation, session, client, retryable_write, publish_events, user_fields, exhaust_allowed)
681 self._raise_if_not_writable(unacknowledged)
682 try:
--> 683 return command(self, dbname, spec, slave_ok,
684 self.is_mongos, read_preference, codec_options,
685 session, client, check, allowable_errors,
~/venv3.8/lib/python3.8/site-packages/pymongo/network.py in command(sock_info, dbname, spec, slave_ok, is_mongos, read_preference, codec_options, session, client, check, allowable_errors, address, check_keys, listeners, max_bson_size, read_concern, parse_write_concern_error, collation, compression_ctx, use_op_m
sg, unacknowledged, user_fields, exhaust_allowed)
157 client._process_response(response_doc, session)
158 if check:
--> 159 helpers._check_command_response(
160 response_doc, sock_info.max_wire_version, None,
161 allowable_errors,
~/venv3.8/lib/python3.8/site-packages/pymongo/helpers.py in _check_command_response(response, max_wire_version, msg, allowable_errors, parse_write_concern_error)
165
166 msg = msg or "%s"
--> 167 raise OperationFailure(msg % errmsg, code, response,
168 max_wire_version)
169
OperationFailure: command insert requires authentication, full error: {'ok': 0.0, 'errmsg': 'command insert requires authentication', 'code': 13, 'codeName': 'Unauthorized'}
which fails, but the MongoClient works for some reason:
In [4]: from pymongo import MongoClient
In [5]: col = MongoClient(host='localhost', port=27017, username='root', password='pass')
In [6]: col.local.col.insert_one({'a':1})
Out[6]: <pymongo.results.InsertOneResult at 0x7ff2a347a8c0>

Using pretrained models from sparknlp on Databricks

I am trying to follow the official examples from John Snow Labs but every time I get a TypeError: 'JavaPackage' object is not callable error. I followed all of the steps in the Databricks install documentation but no matter what walkthrough I try, either this one or this one it fails.
An example of the first (after doing the installs):
import sparknlp
from sparknlp.pretrained import *
pipeline = PretrainedPipeline('recognize_entities_dl', 'en')
recognize_entities_dl download started this may take some time.
TypeError: 'JavaPackage' object is not callable
TypeError Traceback (most recent call last)
<command-937510457011238> in <module>
----> 1 pipeline = PretrainedPipeline('recognize_entities_dl', 'en')
2
3 # ner_bert = NerDLModel.pretrained('ner_dl_bert')
4
5 # pipeline = PretrainedPipeline('recognize_entities_dl', 'en', 'https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_dl_bert_en_2.4.3_2.4_1584624951079.zip')
/databricks/python/lib/python3.7/site-packages/sparknlp/pretrained.py in __init__(self, name, lang, remote_loc, parse_embeddings, disk_location)
89 def __init__(self, name, lang='en', remote_loc=None, parse_embeddings=False, disk_location=None):
90 if not disk_location:
---> 91 self.model = ResourceDownloader().downloadPipeline(name, lang, remote_loc)
92 else:
93 self.model = PipelineModel.load(disk_location)
/databricks/python/lib/python3.7/site-packages/sparknlp/pretrained.py in downloadPipeline(name, language, remote_loc)
49 def downloadPipeline(name, language, remote_loc=None):
50 print(name + " download started this may take some time.")
---> 51 file_size = _internal._GetResourceSize(name, language, remote_loc).apply()
52 if file_size == "-1":
53 print("Can not find the model to download please check the name!")
/databricks/python/lib/python3.7/site-packages/sparknlp/internal.py in __init__(self, name, language, remote_loc)
190 def __init__(self, name, language, remote_loc):
191 super(_GetResourceSize, self).__init__(
--> 192 "com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize", name, language, remote_loc)
193
194
/databricks/python/lib/python3.7/site-packages/sparknlp/internal.py in __init__(self, java_obj, *args)
127 super(ExtendedJavaWrapper, self).__init__(java_obj)
128 self.sc = SparkContext._active_spark_context
--> 129 self._java_obj = self.new_java_obj(java_obj, *args)
130 self.java_obj = self._java_obj
131
/databricks/python/lib/python3.7/site-packages/sparknlp/internal.py in new_java_obj(self, java_class, *args)
137
138 def new_java_obj(self, java_class, *args):
--> 139 return self._new_java_obj(java_class, *args)
140
141 def new_java_array(self, pylist, java_class):
/databricks/spark/python/pyspark/ml/wrapper.py in _new_java_obj(java_class, *args)
65 java_obj = getattr(java_obj, name)
66 java_args = [_py2java(sc, arg) for arg in args]
---> 67 return java_obj(*java_args)
68
69 #staticmethod
TypeError: 'JavaPackage' object is not callable
I get a similar if not the exact error if I try:
pipeline = PretrainedPipeline('recognize_entities_dl', 'en', 'https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_dl_bert_en_2.4.3_2.4_1584624951079.zip')
I also get the same error for the second example. The Databricks Runtime Version is: 6.5 (includes Apache Spark 2.4.5, Scala 2.11), which is on the list of approved runtimes.
I'm not sure what the error messages mean or how to resolve them.
I found out that 'JavaPackage' object is not callable is caused by the spark-nlp (assembly jars) missing. So I made sure that these jars were downloaded and then placed in BOTH the executor and driver. E.g
when building the Spark docker image do something like
RUN cd /opt/spark/jars && \
wget https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-nlp-assembly-2.6.4.jar
and also on the driver image/machine make sure the jar exists in the local directoy. Then set
conf.set("spark.driver.extraClassPath", "/opt/spark/jars/spark-nlp-assembly-2.6.4.jar")
conf.set("spark.executor.extraClassPath", "/opt/spark/jars/spark-nlp-assembly-2.6.4.jar")
The solution for databricks might be a bit different so instead of baking in the jars you may need to host them on S3 and refer to them that way.

TypeError: 'JavaPackage' object is not callable for Xgboost in PySpark

I am trying to make Scala Xgboost API available for my PySpark Notebook. And following this blog:
https://towardsdatascience.com/pyspark-and-xgboost-integration-tested-on-the-kaggle-titanic-dataset-4e75a568bdb
However, keep on running into below err:
spark._jvm.ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator
<py4j.java_gateway.JavaPackage at 0x7fa650fe7a58>
from sparkxgb import XGBoostEstimator
xgboost = XGBoostEstimator(
featuresCol="features",
labelCol="Survival",
predictionCol="prediction"
)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-18-1765fb9e3344> in <module>
4 featuresCol="features",
5 labelCol="Survival",
----> 6 predictionCol="prediction"
7 )
~/spark-assembly-2.4.0-twttr-kryo3-scala2128-hadoop2.9.2.t05/python/pyspark/__init__.py in wrapper(self, *args, **kwargs)
108 raise TypeError("Method %s forces keyword arguments." % func.__name__)
109 self._input_kwargs = kwargs
--> 110 return func(self, **kwargs)
111 return wrapper
112
~/local/spark-3536cd7a-6188-4ca8-b3d0-57d42cd01531/userFiles-0a0d90bc-96b4-43f2-bf21-00ae0e6f7309/sparkxgb.zip/sparkxgb/xgboost.py in __init__(self, checkpoint_path, checkpointInterval, missing, nthread, nworkers, silent, use_external_memory, baseMarginCol, featuresCol, labelCol, predictionCol, weightCol, base_score, booster, eval_metric, num_class, num_round, objective, seed, alpha, colsample_bytree, colsample_bylevel, eta, gamma, grow_policy, max_bin, max_delta_step, max_depth, min_child_weight, reg_lambda, scale_pos_weight, sketch_eps, subsample, tree_method, normalize_type, rate_drop, sample_type, skip_drop, lambda_bias)
113
114 super(XGBoostEstimator, self).__init__()
--> 115 self._java_obj = self._new_java_obj("ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator", self.uid)
116 self._create_params_from_java()
117 self._setDefault(
~/spark-assembly-2.4.0-twttr-kryo3-scala2128-hadoop2.9.2.t05/python/pyspark/ml/wrapper.py in _new_java_obj(java_class, *args)
65 java_obj = getattr(java_obj, name)
66 java_args = [_py2java(sc, arg) for arg in args]
---> 67 return java_obj(*java_args)
68
69 #staticmethod
TypeError: 'JavaPackage' object is not callable
I already google this error and tried below things. I got all ideas from this blog https://github.com/JohnSnowLabs/spark-nlp/issues/232 :
Make sure Xgboost4j is in the SPARK_DIST_CLASSPATH. Already checked.
$echo $SPARK_DIST_CLASSPATH | tr " " "\n" | grep 'xgboost4j' | rev | cut -d'/' -f1 | rev
xgboost4j-0.72.jar
xgboost4j-spark.72.jar
Make sure they are added to EXTRA_CLASSPATH. - Done
Updating configs.
'export PYSPARK_SUBMIT_ARGS="--conf spark.jars=$SPARK_HOME/jars/* --conf spark.driver.extraClassPath=$SPARK_HOME/jars/* --conf spark.executor.extraClassPath=$SPARK_HOME/jars/* pyspark-shell"',
Hardware Info:
Machine: Linux
Using Jupyter Notebook.
Spark Version 2.4.0
python3.6
I found the problem, The problem was that the sparkxbg.zip(which I downloaded over internet) is written for xgboost4j-0.72. However, my jars were from xgoost4j-0.9. And the API has been completetly changed. As a result 0.9 version didn't had any class named ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator. And hence the error. You can see the difference in API below:
https://github.com/dmlc/xgboost/tree/release_0.72/jvm-packages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark
vs
https://github.com/dmlc/xgboost/tree/v0.90/jvm-packages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark

Unable to open files, with the path in Jupyter notebook

I have reinstalled the anaconda after formatting my machine, since I am getting error while opening the files in jupyter notebook.
Initially I tried access the file from desktop location, as I got an error again tried to access from D drive. both were not successful attempts.
salaries = pd.read_excel('D:\\housesales.xlsx')
Below is the error
FileNotFoundError Traceback (most recent call last) <ipython-input-13-6d8e17cbb085> in <module> ----> 1 salaries = pd.read_excel('D:\housesales.xlsx') ~\Anaconda3\lib\site-packages\pandas\util_decorators.py in wrapper(*args, **kwargs) 186 else: 187 kwargs[new_arg_name] = new_arg_value --> 188 return func(*args, **kwargs) 189 return wrapper 190 return _deprecate_kwarg ~\Anaconda3\lib\site-packages\pandas\util_decorators.py in wrapper(*args, **kwargs) 186 else: 187 kwargs[new_arg_name] = new_arg_value --> 188 return func(*args, **kwargs) 189 return wrapper 190 return _deprecate_kwarg ~\Anaconda3\lib\site-packages\pandas\io\excel.py in read_excel(io, sheet_name, header, names, index_col, parse_cols, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, verbose, parse_dates, date_parser, thousands, comment, skip_footer, skipfooter, convert_float, mangle_dupe_cols, **kwds) 348 349 if not isinstance(io, ExcelFile): --> 350 io = ExcelFile(io, engine=engine) 351 352 return io.parse( ~\Anaconda3\lib\site-packages\pandas\io\excel.py in init(self, io, engine) 651 self._io = _stringify_path(io) 652 --> 653 self._reader = self._enginesengine 654 655 def fspath(self): ~\Anaconda3\lib\site-packages\pandas\io\excel.py in init(self, filepath_or_buffer) 422 self.book = xlrd.open_workbook(file_contents=data) 423 elif isinstance(filepath_or_buffer, compat.string_types): --> 424 self.book = xlrd.open_workbook(filepath_or_buffer) 425 else: 426 raise ValueError('Must explicitly set engine if not passing in' ~\Anaconda3\lib\site-packages\xlrd__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows) 109 else: 110 filename = os.path.expanduser(filename) --> 111 with open(filename, "rb") as f: 112 peek = f.read(peeksz) 113 if peek == b"PK\x03\x04": # a ZIP file FileNotFoundError: [Errno 2] No such file or directory: 'D:\housesales.xlsx'
Sounds like your housesales.xlsx file is on your Desktop, but you do not include the Desktop folder in the path to your file.
salaries = pd.read_excel('D:\\Desktop\housesales.xlsx')
I recommend you use jupyter lab as it has a file tree.
Running this bash command in a notebook cell will tell you the working directory of your jupyter instance so you know where it is looking for files.
!pwd
You could also move your file to that directory and then just access it as
salaries = pd.read_excel('housesales.xlsx')