Mongo connector error: Unable to process oplog document - mongodb

I am new to neo4j-doc-manager and I am trying to use neo4j-doc-manager to view the collection from my mongoDB to a created graph in neo4j as per:
https://neo4j.com/developer/mongodb/
I've have my mongoDB and neo4j instance running in local and I'm using the following command:
mongo-connector -m mongodb://localhost:27017/axa -t
http://<user_name>:
<password>#localhost:7474/C:/Users/user_name/.Ne
o4jDesktop/neo4jDatabases/database-c791fa15-9a0d-4051-bb1f-
316ec9f1c7df/installation-4.0.3/data/ -d neo4j_doc_manager
However I get an error:
2020-04-17 15:49:47,011 [ERROR] mongo_connector.oplog_manager:309 - **Unable to process oplog document** {'ts': Timestamp(1587118784, 2), 't': 9, 'h': 0, 'v': 2, 'op': 'i', 'ns': 'axa.talks', 'ui': UUID('3245621e-e204-49fc-8350-d9950246fa6c'), 'wall': datetime.datetime(2020, 4, 17, 10, 19, 44, 994000), 'o': {'session': {'title': '12 Years of Spring: An Open Source Journey', 'abstract': 'Spring emerged as a core open source project in early 2003 and evolved to a broad portfolio of open source projects up until 2015.'}, 'topics': ['keynote', 'spring'], 'room': 'Auditorium', 'timeslot': 'Wed 29th, 09:30-10:30', 'speaker': {'name': 'Juergen Hoeller', 'bio': 'Juergen Hoeller is co-founder of the Spring Framework open source project.', 'twitter': 'https://twitter.com/springjuergen', 'picture': 'http://www.springio.net/wp-content/uploads/2014/11/juergen_hoeller-220x220.jpeg'}}}
Traceback (most recent call last):
File "c:\users\user_name\pycharmprojects\axa_experience\venv\lib\site-packages\py2neo\core.py", line 258, in get
response = self.__base.get(headers=headers, redirect_limit=redirect_limit, **kwargs)
File "c:\users\user_name\pycharmprojects\axa_experience\venv\lib\site-packages\py2neo\packages\httpstream\http.py", line 966, in get
return self.__get_or_head("GET", if_modified_since, headers, redirect_limit, **kwargs)
File "c:\users\user_name\pycharmprojects\axa_experience\venv\lib\site-packages\py2neo\packages\httpstream\http.py", line 943, in __get_or_head
return rq.submit(redirect_limit=redirect_limit, **kwargs)
File "c:\users\user_name\pycharmprojects\axa_experience\venv\lib\site-packages\py2neo\packages\httpstream\http.py", line 452, in submit
return Response.wrap(http, uri, self, rs, **response_kwargs)
File "c:\users\user_name\pycharmprojects\axa_experience\venv\lib\site-packages\py2neo\packages\httpstream\http.py", line 489, in wrap
raise inst
**py2neo.packages.httpstream.http.ClientError: 404 Not Found**
Versions used:
Python - 3.8
mongoDB - 4.2.5
neo4j - 4.0.3
Any help in this regards, I would really appreciate.

I was having the same problem and I think the issue has to do with the version of py2neo. Mongo connector only seems to work with version 2.0.7 but when you install that version Neo4j 4.0 doesn't work with version 2.0.7. This is where I got stuck and found no solution to fix it. Maybe using Neo4J 3.0 could fix that but that wouldn't work for me as I need 4.0 for a fabric database. I've recently started looking into APOC procedures for mongodb instead. Hope this was helpful.

The doc-manager library you are using requires that the Mongo api-rest work, and in new versions it no longer works. If you want to use mongo version <3.2 (it has the active api rest).

Related

Airflow run a DAG to manipulate DB2 data that rasise a jaydebeapi.Error

I follow the offcial website of Airflow to produce my Airflow DAG to connect to DB2. When i run a DAG to insert data or update data that will raise a jaydebeapi.Error. Even though Airflow raise a jaydebeapi.Error, the data still has inserted/updated in DB2 successfully.
The DAG on the Airflow UI will be marked FAILED. I don't know what steps i miss to do.
My DAG code snippet:
with DAG("my_dag1", default_args=default_args,
schedule_interval="#daily", catchup=False) as dag:
cerating_table = JdbcOperator(
task_id='creating_table',
jdbc_conn_id='db2',
sql=r"""
insert into DB2ECIF.T2(C1,C1_DATE) VALUES('TEST',CURRENT DATE);
""",
autocommit=True,
dag=dag
)
DAG log:
[2022-06-20 02:16:03,743] {base.py:68} INFO - Using connection ID 'db2' for task execution.
[2022-06-20 02:16:04,785] {dbapi.py:213} INFO - Running statement:
insert into DB2ECIF.T2(C1,C1_DATE) VALUES('TEST',CURRENT DATE);
, parameters: None
[2022-06-20 02:16:04,842] {dbapi.py:221} INFO - Rows affected: 1
[2022-06-20 02:16:04,844] {taskinstance.py:1889} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/jdbc/operators/jdbc.py", line 76, in execute
return hook.run(self.sql, self.autocommit, parameters=self.parameters, handler=fetch_all_handler)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/hooks/dbapi.py", line 195, in run
result = handler(cur)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/jdbc/operators/jdbc.py", line 30, in fetch_all_handler
return cursor.fetchall()
File "/home/airflow/.local/lib/python3.7/site-packages/jaydebeapi/__init__.py", line 596, in fetchall
row = self.fetchone()
File "/home/airflow/.local/lib/python3.7/site-packages/jaydebeapi/__init__.py", line 561, in fetchone
raise Error()
jaydebeapi.Error
[2022-06-20 02:16:04,847] {taskinstance.py:1400} INFO - Marking task as FAILED. dag_id=my_dag1, task_id=creating_table, execution_date=20210101T000000, start_date=, end_date=20220620T021604
I have installed required python packages of Airflow. List below:
Package(System) name/Version
Airflow/2.3.2
IBM DB2/11.5.7
OpenJDK/15.0.2
JayDeBeApi/1.2.0
JPype1/0.7.2
apache-airflow-providers-jdbc/3.0.0
I have tried to use the latest version of item 4(1.2.3) and item 5(1.4.0) still doesn't work.
I also have downgraded Airflow version to 2.2.3 or 2.2.5 got same result.
How to solve this problem?
The error doesn't happen in the original insert query but due to a fetchall introduced in this PR - https://github.com/apache/airflow/pull/23817
Using apache-airflow-providers-jdbc/2.1.3 might be an easy workaround.
To get the root cause, set DEBUG logging level in Airflow and see why the fetchall causes the error. Having the full traceback will help

pymongo atlas loadbalancer error ARM device

I get a LoadBalancerSupportMismatch error accessing my online mongo database/cluster from an ARM device (Jetson Xavier) running ubuntu 18.04 jetson version that came with it. The code works on a normal x86 pc and is run using python 3.6 (I use 3.8 on the normal pc).
My code is straightforward. I anonymized parts of it.
self.online_client = MongoClient(
f"mongodb+srv://<user>:<passowrd>}#<dbname>.pkphq.mongodb.net/Xcontainers?retryWrites=true&w=majority")
self.cloud_coll = self.online_client[<dbname>][<collection>]
self.cloud_coll.insert_one(some_dict)
The error I get on the jetson is:
File "/usr/local/lib/python3.6/dist-packages/pymongo/collection.py", line 1319, in find_one
for result in cursor.limit(-1):
File "/usr/local/lib/python3.6/dist-packages/pymongo/cursor.py", line 1207, in next
if len(self.__data) or self._refresh():
File "/usr/local/lib/python3.6/dist-packages/pymongo/cursor.py", line 1100, in _refresh
self.__session = self.__collection.database.client._ensure_session()
File "/usr/local/lib/python3.6/dist-packages/pymongo/mongo_client.py", line 1816, in _ensure_session
return self.__start_session(True, causal_consistency=False)
File "/usr/local/lib/python3.6/dist-packages/pymongo/mongo_client.py", line 1766, in __start_session
server_session = self._get_server_session()
File "/usr/local/lib/python3.6/dist-packages/pymongo/mongo_client.py", line 1802, in _get_server_session
return self._topology.get_server_session()
File "/usr/local/lib/python3.6/dist-packages/pymongo/topology.py", line 499, in get_server_session
None)
File "/usr/local/lib/python3.6/dist-packages/pymongo/topology.py", line 217, in _select_servers_loop
(self._error_message(selector), timeout, self.description))
pymongo.errors.ServerSelectionTimeoutError: The server is being accessed through a load balancer, but this driver does not have load balancing enabled, full error: {'ok': 0, 'errmsg': 'The server is being accessed through a load balancer, but this driver does not have load balancing enabled', 'code': 354, 'codeName': 'LoadBalancerSupportMismatch'}, Timeout: 30s, Topology Description: <TopologyDescription id: 61ee9d768a646fd4a74f0849, topology_type: Single, servers: [<ServerDescription ('containers-lb.pkphq.mongodb.net', 27017) server_type: Unknown, rtt: None, error=OperationFailure("The server is being accessed through a load balancer, but this driver does not have load balancing enabled, full error: {'ok': 0, 'errmsg': 'The server is being accessed through a load balancer, but this driver does not have load balancing enabled', 'code': 354, 'codeName': 'LoadBalancerSupportMismatch'}",)>]>
[INFO] [1643027861.848502]: No id to push measurement
My little journey at resolving the issue brought me here, as pretty much the only thing that seemed relevant: https://www.mongodb.com/community/forums/t/scala-driver-2-9-0-connection-fails-with-loadbalancersupportmismatch/126525/2 . So it appeared that the scala driver is not up to date. It seems I need to update it using sbt or maven: http://mongodb.github.io/mongo-java-driver/4.3/driver-scala/getting-started/installation/
I set up the hardware quite recently, and it's up to date, so a bit puzzling why the driver isn't then up to date.
Looking into the documentation of sbt and maven; it seems totally unrelated at worst and very complicated at best to get pymongo working properly again with mongo atlas.
Is there a better solution to make the load balancer issue go away, or get my driver up to date?
Had the same issue, fixed it by upgrading pymongo to version 3.12, since this document describes that as the minimum version for serverless clusters (which is my case)
It seemed upgrading the cluster worked. I first used M0 - M2 cluster. Paying more for M10 somehow fixed the issue.

Using Beautiful Soup on the heroku Application

I am trying to deploy a bot I made in Python using the following libraries:
requests, beautifulsoup4, discord.
This is to be deployed using I believe git hub and Heroku. The bot deploys successfully; however, when I check the logs, the bot has crashed. Here is the error message:
2020-05-17T23:17:42.624634+00:00 app[api]: Deploy 83c32a30 by user ****************************
2020-05-17T23:17:42.624634+00:00 app[api]: Release v12 created by user ****************************
2
2020-05-17T23:17:43.134443+00:00 heroku[worker.1]: State changed from crashed to starting
2020-05-17T23:17:48.338694+00:00 heroku[worker.1]: State changed from starting to up
2020-05-17T23:17:51.764352+00:00 heroku[worker.1]: State changed from up to crashed
2020-05-17T23:17:51.660991+00:00 app[worker.1]: Traceback (most recent call last):
2020-05-17T23:17:51.661016+00:00 app[worker.1]: File "BocoBot_Version1.py", line 126, in <module>
2020-05-17T23:17:51.661182+00:00 app[worker.1]: soup = BeautifulSoup(source, 'lxml')
2020-05-17T23:17:51.661184+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.6/site-packages/bs4/__init__.py", line 245, in __init__
2020-05-17T23:17:51.661401+00:00 app[worker.1]: % ",".join(features))
**2020-05-17T23:17:51.661423+00:00 app[worker.1]: bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?**
2020-05-17T23:17:57.000000+00:00 app[api]: Build succeeded
I believe this is the issue in question:
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
But I do not know what I need to do to resolve it. My guess is that it has to do with my requirements.txt file where I tell it what packages to add. But no matter what changes I make to BeautifulSoup4, it continues not to work.
Here is the requirements.txt file information:
git+https://github.com/Rapptz/discord.py
PyNaCl==1.3.0
pandas
beautifulsoup4
requests
discord
dnspython==1.16.0
async-timeout==3.0.1
Any suggestions would be greatly appreciated and I will be happy to add more information.
Try adding lxml to your requirements.txt.

PyMongo AutoReconnect: timed out

I work in an Azure environment. I have a VM that runs a Django application (Open edX) and a Mongo server on another VM instance (Ubuntu 16.04). Whenever I try to load anything in the application (where the data is fetched from the Mongo server), I would get an error like this one:
Feb 23 12:49:43 xxxxx [service_variant=lms][mongodb_proxy][env:sandbox] ERROR [xxxxx 13875] [mongodb_proxy.py:55] - Attempt 0
Traceback (most recent call last):
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/mongodb_proxy.py", line 53, in wrapper
return func(*args, **kwargs)
File "/edx/app/edxapp/edx-platform/common/lib/xmodule/xmodule/contentstore/mongo.py", line 135, in find
with self.fs.get(content_id) as fp:
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/gridfs/__init__.py", line 159, in get
return GridOut(self.__collection, file_id)
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/gridfs/grid_file.py", line 406, in __init__
self._ensure_file()
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/gridfs/grid_file.py", line 429, in _ensure_file
self._file = self.__files.find_one({"_id": self.__file_id})
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/pymongo/collection.py", line 1084, in find_one
for result in cursor.limit(-1):
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/pymongo/cursor.py", line 1149, in next
if len(self.__data) or self._refresh():
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/pymongo/cursor.py", line 1081, in _refresh
self.__codec_options.uuid_representation))
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/pymongo/cursor.py", line 996, in __send_message
res = client._send_message_with_response(message, **kwargs)
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/pymongo/mongo_client.py", line 1366, in _send_message_with_response
raise AutoReconnect(str(e))
AutoReconnect: timed out
First I thought it was because my Mongo server lived in an instance outside of the Django application's virtual network. I created a new Mongo server on an instance inside the same virtual network and would still get these issues. Mind you, I receive the data eventually but I feel like I wouldn't get timed out errors if the connection is normal.
If it helps, here's the Ansible playbook that I used to create the Mongo server: https://github.com/edx/configuration/tree/master/playbooks/roles/mongo_3_2
Also I have tailed the Mongo log file and this is the only line that would appear at the same time I would get the timed out error on the application server:
2018-02-23T12:49:20.890+0000 [conn5] authenticate db: edxapp { authenticate: 1, user: "user", nonce: "xxx", key: "xxx" }
mongostat and mongotop don't show anything out of the ordinary. Also here's the htop output:
I don't know what else to look for or how to fix this issue.
I forgot to change the Mongo server IP's in the Django application settings to point to the new private IP address inside the virtual network instead of the public IP. After I've changed that it don't get that issue anymore.
If you are reading this, make sure you change the private IP to a static one in Azure, if you are using that IP address in the Djagno application settings.

Cursor id not valid at server even with no_cursor_timeout=True

Traceback (most recent call last):
File "from_mongo.py", line 27, in <module>
for sale in pm.events.find({"type":"sale", "date":{"$gt":now-(_60delta+_2delta)}}, no_cursor_timeout=True, batch_size=100):
File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 968, in __next__
if len(self.__data) or self._refresh():
File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 922, in _refresh
self.__id))
File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 838, in __send_message
codec_options=self.__codec_options)
File "/usr/local/lib/python2.7/dist-packages/pymongo/helpers.py", line 110, in _unpack_response
cursor_id)
pymongo.errors.CursorNotFound: cursor id '1025112076089406867' not valid at server
I have also experimented with bigger or lower batch size, and no no_cursor_timeout at all. I have even managed to get this error on a very small collection (200 documents with id and title). It seems to happen when the database is not responsive (heavy inserts). The setup is a cluster of 3 shards of 3 replica sets (9 mongodb instances), mongodb 3.0.
Based on the line numbers in your traceback, it looks like you're using PyMongo 3, which was released last week. Are you using multiple mongos servers in a sharded clusters? If so, the error is probably a symptom of a critical new bug in PyMongo 3:
https://jira.mongodb.org/browse/PYTHON-898
It will be fixed in PyMongo 3.0.1, which we'll release within a week.
It just hit me that I thought I was using pymongo 3.0 which has a flag called no_cursor_timeout=True, when in fact I was using 2.8.