MongoDB Assertion Error: starting_from == self.__retrieved (pymongo driver) - mongodb

MongoDB Question:
We're using a sharded replicaset, running pymongo 2.2 against mongo (version: 2.1.1-pre-). We're getting a traceback when a query returns more than one result document.
Traceback (most recent call last):
File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/opt/DCM/mods/plugin.py", line 25, in run
self._mod.collect_metrics_dcm()
File "/opt/DCM/plugins/res.py", line 115, in collect_metrics_dcm
ms.updateSpecificMetric(metricName, value, timestamp)
File "/opt/DCM/mods/mongoSaver.py", line 155, in updateSpecificMetric
latestDoc = self.getLatestDoc(metricName)
File "/opt/DCM/mods/mongoSaver.py", line 70, in getLatestDoc
for d in dlist:
File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 747, in next
if len(self.__data) or self._refresh():
File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 698, in _refresh
self.__uuid_subtype))
File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 668, in __send_message
assert response["starting_from"] == self.__retrieved
AssertionError
The code that give what dlist is is a simple find(). I've tried reIndex(), no joy. I've tried stopping and starting the mongo server, no joy.
This is easily replicable for me. Any ideas?

Ok, so traced this down a bit, and I have a SOLUTION for this assertion error.
There is a BUG in Mongo. When querying a sharded replicaset, Mongo returns an incorrect value for 'starting_from'. Instead of returning 0 on the first query, it's returning the number of records received instead of the offset value. I have a patch for pymongo to protect against this bad info:
File is site-packages/pymongo/cursor.py.
[user#hostname]$ diff cursor.py.orig cursor.py
631,632c631,634
< if not self.__tailable:
< assert response["starting_from"] == self.__retrieved
---
> if ((not self.__tailable) and (self.__retrieved != 0) and (response["starting_from"] != self.__retrieved)):
> from pprint import pformat
> msg = "Server response of 'starting_from' is '%s', but self__retrieved (which is only set to nonzero below here) is '%s'." % (pformat(response), pformat(self.__retrieved))
> assert False, msg
The 'starting_from' comes from helpers.py decoding the response from Mongo:
result["starting_from"] = struct.unpack("<i", response[12:16])[0]
So, it's the 12th thru the 15th byte of Mongo's response.

This is a bug in the 2.1.1 development release of mongos. See https://jira.mongodb.org/browse/SERVER-5844

Related

Why does PyMongo give Unsupported projection option: $substr when listing collections?

I have the following code:
client = MongoClient(uri)
db = client['my_db']
print(db.collection_names())
#print(db.list_collection_names())
and I get the error
File "C:\Users\gwerner004\eclipse-workspace\MongoTestRasa\FirstTest.py", line 17, in connect
print(db.collection_names())
File "C:\Users\gwerner004\AppData\Local\Programs\Python\Python36\lib\site-packages\pymongo\database.py", line 715, in collection_names
nameOnly=True, **kws)]
File "C:\Users\gwerner004\AppData\Local\Programs\Python\Python36\lib\site-packages\pymongo\database.py", line 677, in list_collections
**kwargs)
File "C:\Users\gwerner004\AppData\Local\Programs\Python\Python36\lib\site-packages\pymongo\database.py", line 651, in _list_collections
cursor = self._command(sock_info, cmd, slave_okay)["cursor"]
File "C:\Users\gwerner004\AppData\Local\Programs\Python\Python36\lib\site-packages\pymongo\database.py", line 514, in _command
client=self.__client)
File "C:\Users\gwerner004\AppData\Local\Programs\Python\Python36\lib\site-packages\pymongo\pool.py", line 579, in command
unacknowledged=unacknowledged)
File "C:\Users\gwerner004\AppData\Local\Programs\Python\Python36\lib\site-packages\pymongo\network.py", line 150, in command
parse_write_concern_error=parse_write_concern_error)
File "C:\Users\gwerner004\AppData\Local\Programs\Python\Python36\lib\site-packages\pymongo\helpers.py", line 155, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Unsupported projection option: $substr
Why do I get a failure for such a basic operation? I am running on Windows 10 and using Python 3.6.7. My PyMongo is 3.7.2
The $substr operator works on all currently supported MongoDB versions (2.6-4.4):
> db.foo.aggregate([{$project:{"name": {"$substr": ["$name", 2, -1]}}}])
{ "_id" : ObjectId("5fc032e56bc5b2e2216cdd08"), "name" : "llo" }
Most likely you are using either an ancient MongoDB installation or, per one of the comments, an imitation database like CosmosDB that does not behave as MongoDB itself would (which are also not supported by official MongoDB drivers).

.extractText() returns "invalid literal for decimal"

I'm coding something which will read PDFs online and return a set of keywords that are found in the document. However I keep running into a problem with the extractText() function from the PyPDF2 package.
Here's my code to open the PDFs and read it:
x = myurl.pdf
if ".pdf" in x:
remoteFile = urlopen(Request(x, headers={"User-Agent": "Magic-Browser"})).read()
memoryFile = StringIO(remoteFile)
pdfFile = PyPDF2.PdfFileReader(memoryFile, strict=False)
num_pages = pdfFile.numPages
count = 0
text = ""
while count < num_pages:
pageObj = pdfFile.getPage(count)
count += 1
text += pageObj.extractText()
The error that I keep running into on the extractText() line goes like this:
Traceback (most recent call last):
File "errortest.py", line 30, in <module>
text += pageObj.extractText()
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/pdf.py", line 2595, in extractText
content = ContentStream(content, self.pdf)
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/pdf.py", line 2674, in __init__
self.__parseContentStream(stream)
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/pdf.py", line 2706, in __parseContentStream
operands.append(readObject(stream, None))
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/generic.py", line 98, in readObject
return NumberObject.readFromStream(stream)
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/generic.py", line 271, in readFromStream
return FloatObject(num)
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/generic.py", line 231, in __new__
return decimal.Decimal.__new__(cls, str(value))
File "/anaconda2/lib/python2.7/decimal.py", line 547, in __new__
"Invalid literal for Decimal: %r" % value)
File "/anaconda2/lib/python2.7/decimal.py", line 3872, in _raise_error
raise error(explanation)
decimal.InvalidOperation: Invalid literal for Decimal: '99.-72'
Would be great if someone could help me out! Thanks!
There is too little information to be certain, but PyPDF2 (and now pypdf) improved a lot in 2022. You will probably just need to upgrade to the latest version of pypdf.
If you encounter a bug in pypdf again, please open an issue: https://github.com/py-pdf/pypdf
A good bug ticket contains (1) your pypdf version (2) the code + PDF document that caused the issue.

"InterfaceError: connection already closed" when using multiprocessing.Pool on black box function that queries PostgreSQL database

I've been given a Python (2.7) function that takes 3 strings as arguments, and returns a list of dictionaries. Due to the nature of the project, I can't alter the function, which is quite complex, calling several other non-standard Python modules and querying a PostgreSQL database using psychopg2. I think that it's the Postgres functionality that's causing me problems.
I want to use the multiprocessing module to speed up calling the function hundreds of times. I've written a "helper" function so that I can use multiprocessing.Pool (which takes only 1 argument) with my function:
from function_script import function
def function_helper(args):
return function(*args)
And my main code looks like this:
from helper_script import function_helper
from multiprocessing import Pool
argument_a = ['a0', 'a1', ..., 'a99']
argument_b = ['b0', 'b1', ..., 'b99']
argument_c = ['c0', 'c1', ..., 'c99']
input = zip(argument_a, argument_b, argument_c)
p = Pool(4)
results = p.map(function_helper, input)
print results
What I'm expecting is a list of lists of dictionaries, however I get the following errors:
Traceback (most recent call last):
File "/local/python/2.7/lib/python2.7/site-packages/variantValidator/variantValidator.py", line 898, in validator
vr.validate(input_parses)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/validator.py", line 33, in validate
return self._ivr.validate(var, strict) and self._evr.validate(var, strict)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/validator.py", line 69, in validate
(res, msg) = self._ref_is_valid(var)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/validator.py", line 89, in _ref_is_valid
var_x = self.vm.c_to_n(var) if var.type == "c" else var
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/variantmapper.py", line 223, in c_to_n
tm = self._fetch_TranscriptMapper(tx_ac=var_c.ac, alt_ac=var_c.ac, alt_aln_method="transcript")
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/decorators/lru_cache.py", line 176, in wrapper
result = user_function(*args, **kwds)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/variantmapper.py", line 372, in _fetch_TranscriptMapper
self.hdp, tx_ac=tx_ac, alt_ac=alt_ac, alt_aln_method=alt_aln_method)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/transcriptmapper.py", line 69, in __init__
self.tx_identity_info = hdp.get_tx_identity_info(self.tx_ac)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/decorators/lru_cache.py", line 176, in wrapper
result = user_function(*args, **kwds)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/dataproviders/uta.py", line 353, in get_tx_identity_info
rows = self._fetchall(self._queries['tx_identity_info'], [tx_ac])
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/dataproviders/uta.py", line 216, in _fetchall
with self._get_cursor() as cur:
File "/local/python/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/dataproviders/uta.py", line 529, in _get_cursor
cur.execute("set search_path = " + self.url.schema + ";")
File "/local/python/2.7/lib/python2.7/site-packages/psycopg2/extras.py", line 144, in execute
return super(DictCursor, self).execute(query, vars)
DatabaseError: SSL error: decryption failed or bad record mac
And:
Traceback (most recent call last):
File "/local/python/2.7/lib/python2.7/site-packages/variantValidator/variantValidator.py", line 898, in validator
vr.validate(input_parses)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/validator.py", line 33, in validate
return self._ivr.validate(var, strict) and self._evr.validate(var, strict)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/validator.py", line 69, in validate
(res, msg) = self._ref_is_valid(var)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/validator.py", line 89, in _ref_is_valid
var_x = self.vm.c_to_n(var) if var.type == "c" else var
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/variantmapper.py", line 223, in c_to_n
tm = self._fetch_TranscriptMapper(tx_ac=var_c.ac, alt_ac=var_c.ac, alt_aln_method="transcript")
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/decorators/lru_cache.py", line 176, in wrapper
result = user_function(*args, **kwds)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/variantmapper.py", line 372, in _fetch_TranscriptMapper
self.hdp, tx_ac=tx_ac, alt_ac=alt_ac, alt_aln_method=alt_aln_method)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/transcriptmapper.py", line 69, in __init__
self.tx_identity_info = hdp.get_tx_identity_info(self.tx_ac)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/decorators/lru_cache.py", line 176, in wrapper
result = user_function(*args, **kwds)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/dataproviders/uta.py", line 353, in get_tx_identity_info
rows = self._fetchall(self._queries['tx_identity_info'], [tx_ac])
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/dataproviders/uta.py", line 216, in _fetchall
with self._get_cursor() as cur:
File "/local/python/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/dataproviders/uta.py", line 526, in _get_cursor
conn.autocommit = True
InterfaceError: connection already closed
Does anybody know what might cause the Pool function to behave like this, when it seems so simple to use in other examples that I've tried? If this isn't enough information to go on, can anyone advise me on a way of getting to the bottom of the problem (this is the first time I've worked with someone else's code)? Alternatively, are there any other ways that I could use the multiprocessing module to call the function hundreds of times?
Thanks
I think what may be happening is that your connection object is used across all workers and when 1 worker has completed all its tasks it closes the connection and meanwhile the other workers are still working and the connection is closed so when one of those workers tries to use the db it is already closed.

How to enable text search in mongo?

I tried so many things..
# in replica set configuration, specify the name of the replica set
# replSet = setname
setParameter=textSearchEnabled=true
This is the part of config file. Still after setting this the text search is not enabled.
Am using pymongo for text searching
This is my code
db.command("text", 'tracks' ,search=request.POST['content_search'], limit = 12)['results']]
My mongo version is 2.4.10. Please guide me.
This is the traceback
Traceback (most recent call last):
File "/home/nidhin/social-media-widget/env/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 114, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/home/nidhin/social-media-widget/env/local/lib/python2.7/site-packages/django/views/decorators/csrf.py", line 57, in wrapped_view
return view_func(*args, **kwargs)
File "/home/nidhin/social-media-widget/socialmedia/widget/views.py", line 84, in monitor
data = [i['obj'] for i in db.command("text", 'tracks' ,search=request.POST['content_search'], filter = test_data, limit = 12)['results']]
File "/home/nidhin/social-media-widget/env/local/lib/python2.7/site-packages/pymongo/database.py", line 435, in command
uuid_subtype, compile_re, **kwargs)[0]
File "/home/nidhin/social-media-widget/env/local/lib/python2.7/site-packages/pymongo/database.py", line 341, in _command
msg, allowable_errors)
File "/home/nidhin/social-media-widget/env/local/lib/python2.7/site-packages/pymongo/helpers.py", line 178, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
OperationFailure: command SON([('text', 'tracks'), ('filter', {'publisher_desc': u'Blogs'}), ('search', u'box'), ('limit', 12)]) failed: text search not enabled
Adding this line to config file should work:
setParameter=textSearchEnabled=true
How do you start mongdb?
Edit:
I recommend you to check that:
You have mongodb started with this config.
You could check it by calling db.runCommand("getCmdLineOpts") in MongoDb shell
via MongoDb Shell db.runCommand({getParameter:1, textSearchEnabled: 1}) returns textSearchEnabled:true

handle crash of flask app on database restart

Currently have made a flask application which crashes when I do a postgres database restart, because the cursor which was opened is stale ...
How do I handle this situation. Currently connecting the flask app to postgres via psycopg2....
I am not a database expert...
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1701, in __call__
return self.wsgi_app(environ, start_response)
File "/var/www/flaskapps/capp/override.py", line 15, in __call__
return self.app(environ, start_response)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1689, in wsgi_app
response = self.make_response(self.handle_exception(e))
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1687, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1360, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1356, in full_dispatch_request
rv = self.preprocess_request()
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1539, in preprocess_request
rv = func()
File "/usr/local/lib/python2.7/dist-packages/flask_login.py", line 321, in _load_user
self.reload_user()
File "/usr/local/lib/python2.7/dist-packages/flask_login.py", line 350, in reload_user
user = self.user_callback(user_id)
File "/var/www/flaskapps/capp/login_setup.py", line 163, in load_user
cursor.execute(qstr)
File "/usr/share/pyshared/psycopg2/extras.py", line 123, in execute
return _cursor.execute(self, query, vars)
InterfaceError: cursor already closed
This is one of the many cases where your code needs to detect a transient failure and re-try the transaction, re-opening the connection if necessary.
Other cases include deadlocks and serialization failures.
The sqlstate on the exception will let you determine which error cases to retry and how. See the PostgreSQL documentation on error codes for guidance on the meaning of the sqlstate codes.
Sometimes your database interface will through a typed exception that tells you enough just by its data type, too. This doesn't look like one of those cases.