Mongo connector with neo4j doc manager crashing - mongodb

I'm using mongo-connector and neo4j_doc_manager for syncing the mongodb's data to neo4j, it used to work perfectly but today it started giving following error.
2016-07-29 17:18:59,558 [CRITICAL] mongo_connector.oplog_manager:549 - Exception during collection dump
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/mongo_connector/oplog_manager.py", line 501, in do_dump
upsert_all(dm)
File "/usr/local/lib/python2.7/site-packages/mongo_connector/oplog_manager.py", line 485, in upsert_all
dm.bulk_upsert(docs_to_dump(namespace), mapped_ns, long_ts)
File "/usr/local/lib/python2.7/site-packages/mongo_connector/util.py", line 38, in wrapped
reraise(new_type, exc_value, exc_tb)
File "/usr/local/lib/python2.7/site-packages/mongo_connector/util.py", line 32, in wrapped
return f(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/mongo_connector/doc_managers/neo4j_doc_manager.py", line 89, in bulk_upsert
tx.commit()
File "/usr/local/lib/python2.7/site-packages/py2neo/cypher/core.py", line 306, in commit
return self.post(self.__commit or self.__begin_commit)
File "/usr/local/lib/python2.7/site-packages/py2neo/cypher/core.py", line 261, in post
raise self.error_class.hydrate(error)
File "/usr/local/lib/python2.7/site-packages/py2neo/cypher/error/core.py", line 54, in hydrate
error_cls = getattr(error_module, title)
Neo4jOperationFailed: 'module' object has no attribute 'ConstraintValidationFailed'
2016-07-29 17:18:59,563 [ERROR] mongo_connector.oplog_manager:557 - OplogThread: Failed during dump collection cannot recover!

You're trying to insert data which doesn't match the constraints of your Neo4j schema (unicity or existence), and apparently the code doesn't know how to handle that type of error, though it does give its name:
ConstraintValidationFailed
You should maybe activate some log to see the data which it is trying to insert, or the Cypher query it's trying to execute.

Related

Wal-e: unable to to push backups - permission error

We get the following error when we try to push backup using wal-e:
2020-07-16T21:18:55Z <Greenlet at 0x7f2a59fadc48: <wal_e.worker.upload.PartitionUploader object at 0x7f2a59f96cc0>([ExtendedTarInfo(submitted_path='/var/lib/postgres)> failed with PermissionError
wal_e.operator.backup WARNING MSG: blocking on sending WAL segments
DETAIL: The backup was not completed successfully, but we have to wait anyway. See README: TODO about pg_cancel_backup
STRUCTURED: time=2020-07-16T21:18:55.651073-00 pid=19697
wal_e.main CRITICAL MSG: An unprocessed exception has avoided all error handling
DETAIL: Traceback (most recent call last):
File "/var/lib/postgresql/virtualenvs/wal-e/lib/python3.5/site-packages/wal_e/operator/backup.py", line 197, in database_backup
**kwargs)
File "/var/lib/postgresql/virtualenvs/wal-e/lib/python3.5/site-packages/wal_e/operator/backup.py", line 500, in _upload_pg_cluster_dir
pool.put(tpart)
File "/var/lib/postgresql/virtualenvs/wal-e/lib/python3.5/site-packages/wal_e/worker/upload_pool.py", line 108, in put
self._wait()
File "/var/lib/postgresql/virtualenvs/wal-e/lib/python3.5/site-packages/wal_e/worker/upload_pool.py", line 65, in _wait
raise val
File "src/gevent/greenlet.py", line 766, in gevent._greenlet.Greenlet.run
File "/var/lib/postgresql/virtualenvs/wal-e/lib/python3.5/site-packages/wal_e/worker/upload.py", line 96, in __call__
gpg_key=self.gpg_key) as pl:
File "/var/lib/postgresql/virtualenvs/wal-e/lib/python3.5/site-packages/wal_e/pipeline.py", line 92, in __enter__
self.stdin = pipebuf.NonBlockBufferedWriter(stdin)
File "/var/lib/postgresql/virtualenvs/wal-e/lib/python3.5/site-packages/wal_e/pipebuf.py", line 225, in __init__
_setup_fd(self._fd)
File "/var/lib/postgresql/virtualenvs/wal-e/lib/python3.5/site-packages/wal_e/pipebuf.py", line 62, in _setup_fd
set_buf_size(fd)
File "/var/lib/postgresql/virtualenvs/wal-e/lib/python3.5/site-packages/wal_e/pipebuf.py", line 53, in set_buf_size
fcntl.fcntl(fd, fcntl.F_SETPIPE_SZ, OS_PIPE_SZ)
PermissionError: [Errno 1] Operation not permitted
It's not clear why fcntl call may lead to PermissionError.
PostgreSQL version: 9.6, Python: 3.5, Wal-e : 1.1.1 (tried also 1.0.3 and 1.1.0).
It was working previously and stopped working at some point (without any noticeable changes).
Well, I'm late to the game. See https://github.com/wal-e/wal-e/issues/270
I've worked around it by patching wal-e to not set this.

Airflow initdb slot_pool does not exists

I'm facing an issue with airflow initialization on postgres backend
Ubuntu : 18.04.1
Airflow : v1.10.6
Postgres : 10.10
Python 3.6
And when I run
airflow initdb
I get
[2019-11-22 10:17:23,564] {db.py:368} INFO - Creating tables
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py", line 1246, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/default.py", line 581, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.UndefinedTable: relation "airflow.slot_pool" does not exist
LINE 2: FROM airflow.slot_pool
^
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 37, in <module>
args.func(args)
File "/usr/local/lib/python3.6/dist-packages/airflow/bin/cli.py", line 1131, in initdb
db.initdb(settings.RBAC)
File "/usr/local/lib/python3.6/dist-packages/airflow/utils/db.py", line 106, in initdb
upgradedb()
File "/usr/local/lib/python3.6/dist-packages/airflow/utils/db.py", line 377, in upgradedb
add_default_pool_if_not_exists()
File "/usr/local/lib/python3.6/dist-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/airflow/utils/db.py", line 90, in add_default_pool_if_not_exists
if not Pool.get_pool(Pool.DEFAULT_POOL_NAME, session=session):
File "/usr/local/lib/python3.6/dist-packages/airflow/utils/db.py", line 70, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/airflow/models/pool.py", line 44, in get_pool
return session.query(Pool).filter(Pool.pool == pool_name).first()
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/orm/query.py", line 3265, in first
ret = list(self[0:1])
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/orm/query.py", line 3043, in __getitem__
return list(res)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/orm/query.py", line 3367, in __iter__
return self._execute_and_instances(context)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/orm/query.py", line 3392, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py", line 982, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py", line 1101, in _execute_clauseelement
distilled_params,
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py", line 1250, in _execute_context
e, statement, parameters, cursor, context
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py", line 1476, in _handle_dbapi_exception
util.raise_from_cause(sqlalchemy_exception, exc_info)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/util/compat.py", line 152, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py", line 1246, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/default.py", line 581, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation "airflow.slot_pool" does not exist
LINE 2: FROM airflow.slot_pool
^
[SQL: SELECT airflow.slot_pool.id AS airflow_slot_pool_id, airflow.slot_pool.pool AS airflow_slot_pool_pool, airflow.slot_pool.slots AS airflow_slot_pool_slots, airflow.slot_pool.description AS airflow_slot_pool_description
FROM airflow.slot_pool
WHERE airflow.slot_pool.pool = %(pool_1)s
LIMIT %(param_1)s]
[parameters: {'pool_1': 'default_pool', 'param_1': 1}]
(Background on this error at: http://sqlalche.me/e/f405)
I've tried deleting / recreating database and user rights (with search_path as said in doc). My postgres is accessible and well configured as tables have been previously created by airflow before the crash ;)
Any ideas?
I've made a try with with Airflow 1.10.2 and it work smoothly with postgres backend.
This is might be because of examples. Try load_examples = False in airflow.cfg
andrun airflow upgradedb or airflow resetdb
Have you tried
airflow resetdb
airflow initdb: Initialize the metadata database
airflow resetdb: Burn down and rebuild the metadata database
airflow upgradedb: Apply missing migrations - this is also idempotent and safe but it tracks migrations so if your tables aren't in the state that alembic thinks that they are in you would need to find that "state" and edit that
Most importantly, if you have any connections or variables set they will be deleted if you run this.
Airflow uses default backend SQLite, and have default tables(like slot_pool) there. So if you consider to use another backend like postgres you should at least add these default tables there.
$ airflow initdb
I know it's kinda late, but I experienced the same trying $ airflow db init against a MySQL 8 backend, and got a similar error.
Turned out that there was a mismatch in my configuration:
The database was named airflow_db, so I had
sql_alchemy_conn = mysql+mysqlconnector://airflow:******#localhost:3306/airflow_db, but a few lines down in the airflow.cfg, I also had the line
sql_alchemy_schema = airflow

Librabbitmq 2.0.0 with Python 3 gives TypeError: can't pickle memoryview objects

I am using the latest master branch of the git repo https://github.com/celery/librabbitmq and installing librabbitmq==2.0.0 for Python 3.6 by following the instructions in the readme
Using the development version
You can clone the repository by doing the following:
$ git clone git://github.com/celery/librabbitmq.git
Then install it by doing the following:
$ cd librabbitmq
$ make install # or make develop
This works fine (after installing certain binaries for c compliation in the OS), but when I then make a small a+b add task and call it with add.delay(2,2) it fails with the following error. I looked up and saw that Celery 4 uses json as serializer, so clearly it is not because if pickle serialization
Changing from librabbitmq to pyamqp broker works normally
Same exact situation in both MacOS and Ubuntu 16
[2018-04-30 23:40:02,956: CRITICAL/MainProcess] Unrecoverable error:
SystemError(' returned a result with an error set',) Traceback (most
recent call last): File
"/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/kombu/messaging.py",
line 624, in _receive_callback
return on_m(message) if on_m else self.receive(decoded, message) File
"/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/consumer/consumer.py",
line 570, in on_task_received
callbacks, File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/strategy.py",
line 145, in task_message_handler
handle(req) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/worker.py",
line 221, in _process_task_sem
return self._quick_acquire(self._process_task, req) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/kombu/async/semaphore.py",
line 62, in acquire
callback(*partial_args, **partial_kwargs) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/worker.py",
line 226, in _process_task
req.execute_using_pool(self.pool) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/request.py",
line 531, in execute_using_pool
correlation_id=task_id, File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/concurrency/base.py",
line 155, in apply_async
**options) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/billiard/pool.py",
line 1486, in apply_async
self._quick_put((TASK, (result._job, None, func, args, kwds))) File
"/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/concurrency/asynpool.py",
line 813, in send_job
body = dumps(tup, protocol=protocol) TypeError: can't pickle memoryview objects
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File
"/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/worker.py",
line 203, in start
self.blueprint.start(self) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/bootsteps.py",
line 119, in start
step.start(parent) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/bootsteps.py",
line 370, in start
return self.obj.start() File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/consumer/consumer.py",
line 320, in start
blueprint.start(self) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/bootsteps.py",
line 119, in start
step.start(parent) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/consumer/consumer.py",
line 596, in start
c.loop(*c.loop_args()) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/celery/worker/loops.py",
line 88, in asynloop
next(loop) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/kombu/async/hub.py",
line 354, in create_loop
cb(*cbargs) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/kombu/transport/base.py",
line 236, in on_readable
reader(loop) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/kombu/transport/base.py",
line 218, in _read
drain_events(timeout=0) File "/Users/somghosh/.virtualenvs/ctdb/lib/python3.6/site-packages/librabbitmq-2.0.0-py3.6-macosx-10.6-intel.egg/librabbitmq/init.py",
line 227, in drain_events
self._basic_recv(timeout) SystemError: returned a result with an error set
This library is not recommended to use as rabbitmq broker with celery. Instead please try py-amqp. this is more maintained and less buggy.

Error while retrieving result in pymongo

I have an python application which creates number of threads for a job. each thread connects to mongodb and retrieve data. Number of allowed connection to mongodb is 200 which I'm taking care using semaphore. And once mongo querying job is done each thread closes mongodb connection. But while executing this application I'm getting same error for all threads. Error is:
Traceback (most recent call last):
File "C:\Python34\lib\threading.py", line 921, in _bootstrap_inner
self.run()
File "C:\Python34\lib\threading.py", line 869, in run
self._target(*self._args, **self._kwargs)
File "C:/path/pytest/under_construction/testAlgo.py", line 95, in sample_thread
status=monObj.process_status(list_value1,list_value2,5,120,120)
File "C:\path\pytest\under_construction\mongo_lib.py", line 153, in process_status
result=self.mongo_result('Submission','find',q={})
File "C:\path\pytest\under_construction\mongo_lib.py", line 53, in mongo_result
result=list(_query[query_type.lower()](query_string[keys]))
File "C:\Python34\lib\site-packages\pymongo\cursor.py", line 1076, in __next__
if len(self.__data) or self._refresh():
File "C:\Python34\lib\site-packages\pymongo\cursor.py", line 1037, in _refresh
limit, self.__id))
File "C:\Python34\lib\site-packages\pymongo\cursor.py", line 933, in __send_message
res = client._send_message_with_response(message, **kwargs)
File "C:\Python34\lib\site-packages\pymongo\mongo_client.py", line 1205, in _send_message_with_response
response = self.__send_and_receive(message, sock_info)
File "C:\Python34\lib\site-packages\pymongo\mongo_client.py", line 1182, in __send_and_receive
return self.__receive_message_on_socket(1, request_id, sock_info)
File "C:\Python34\lib\site-packages\pymongo\mongo_client.py", line 1174, in __receive_message_on_socket
return self.__receive_data_on_socket(length - 16, sock_info)
File "C:\Python34\lib\site-packages\pymongo\mongo_client.py", line 1153, in __receive_data_on_socket
chunk = sock_info.sock.recv(length)
MemoryError
Code for creating mongo connection
client=MongoClient(mc_name,port)
I was thinking, is this error due to results of all threads accumulating at one port of machine running my application?
MongoClient is a thread-safe connection pool, so you should be creating a single instance that's shared by all the worker threads rather than having each thread create its own.
The connection pool size defaults to 100, but if you want to make it even larger you can use the maxPoolSize parameter to do that (e.g. maxPoolSize=200).

"pysolr.SolrError: [Reason: /solr4/update/]" when running mongo_connector.py

As a follow on from this problem I was having before: (How long does mongo_connector.py usually take?)
I was wondering if anyone else has had this problem when running the following:
$ python /usr/local/lib/python2.7/dist-packages/mongo-connector/mongo_connector.py -m localhost:27017 --docManager /usr/local/lib/python2.7/dist-packages/mongo-connector/doc_managers/solr_doc_manager.py -t http://localhost:8080/solr4
This is the error output I get:
2012-08-20 10:24:11,893 - INFO - Beginning Mongo Connector
2012-08-20 10:24:12,971 - INFO - Starting new HTTP connection (1): localhost
2012-08-20 10:24:12,974 - INFO - Finished 'http://localhost:8080/solr4/update/?commit=true' (post) with body 'u'<commit ' in 0.017 seconds.
2012-08-20 10:24:12,983 - ERROR - [Reason: /solr4/update/]
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/mongo-connector/mongo_connector.py", line 441, in <module>
auth_username=options.admin_name)
File "/usr/local/lib/python2.7/dist-packages/mongo-connector/mongo_connector.py", line 100, in __init__
unique_key=u_key)
File "/usr/local/lib/python2.7/dist-packages/mongo-connector/doc_managers/solr_doc_manager.py", line 54, in __init__
self.run_auto_commit()
File "/usr/local/lib/python2.7/dist-packages/mongo-connector/doc_managers/solr_doc_manager.py", line 95, in run_auto_commit
self.solr.commit()
File "/usr/local/lib/python2.7/dist-packages/pysolr.py", line 802, in commit
return self._update(msg, waitFlush=waitFlush, waitSearcher=waitSearcher)
File "/usr/local/lib/python2.7/dist-packages/pysolr.py", line 359, in _update
return self._send_request('post', path, message, {'Content-type': 'text/xml; charset=utf-8'})
File "/usr/local/lib/python2.7/dist-packages/pysolr.py", line 293, in _send_request
raise SolrError(error_message)
pysolr.SolrError: [Reason: /solr4/update/]
Reason: [Reason: /solr4/update/] is not really an output that I can even start to debug. Solr is working perfectly fine, MongoDB is working perfectly fine. What could this problem be caused by?
I have been following the instructions on this page up to now: http://loutilities.wordpress.com/2012/11/26/complementing-mongodb-with-real-time-solr-search/#comment-183. I've also seen on various websites that adding the following to my Solr's solrconfig.xml should make 'update' accessible, but this is already configured on my system:
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
That's about all the information I have. Any hints as to what I might be doing wrong?