Limit mongo-connector to a specific collection for Solr indexing - mongodb

I am currently attempting to use mongo-connector to automatically feed db updates to Solr. It's working fine through the use of the following command -
mongo-connector -m localhost:27017 -t http://localhost:8983/solr -d mongo_connector/doc_managers/solr_doc_manager.py
However, it is indexing every collection in my mongodb. I have tried the use of the option -n through the following -
mongo-connector -m localhost:27017 -t http://localhost:8983/solr -n feed_scraper_development.articles -d mongo_connector/doc_managers/solr_doc_manager.py
This fails with the following error -
2014-07-24 22:23:23,053 - INFO - Beginning Mongo Connector
2014-07-24 22:23:23,104 - INFO - Starting new HTTP connection (1): localhost
2014-07-24 22:23:23,110 - INFO - Finished 'http://localhost:8983/solr/admin/luke?show=schema&wt=json' (get) with body '' in 0.018 seconds.
2014-07-24 22:23:23,115 - INFO - OplogThread: Initializing oplog thread
2014-07-24 22:23:23,116 - INFO - MongoConnector: Starting connection thread MongoClient('localhost', 27017)
2014-07-24 22:23:23,126 - INFO - Finished 'http://localhost:8983/solr/update/?commit=true' (post) with body 'u'<commit ' in 0.006 seconds.
2014-07-24 22:23:23,129 - INFO - Finished 'http://localhost:8983/solr/select/?q=%2A%3A%2A&sort=_ts+desc&rows=1&wt=json' (get) with body '' in 0.003 seconds.
2014-07-24 22:23:23,337 - INFO - Finished 'http://localhost:8983/solr/select/?q=_ts%3A+%5B6038164010275176560+TO+6038164010275176560%5D&rows=100000000&wt=json' (get) with body '' in 0.207 seconds.
Exception in thread Thread-2:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 808, in __bootstrap_inner
self.run()
File "build/bdist.macosx-10.9-intel/egg/mongo_connector/oplog_manager.py", line 141, in run
cursor = self.init_cursor()
File "build/bdist.macosx-10.9-intel/egg/mongo_connector/oplog_manager.py", line 582, in init_cursor
cursor = self.get_oplog_cursor(timestamp)
File "build/bdist.macosx-10.9-intel/egg/mongo_connector/oplog_manager.py", line 361, in get_oplog_cursor
timestamp = self.rollback()
File "build/bdist.macosx-10.9-intel/egg/mongo_connector/oplog_manager.py", line 664, in rollback
if doc['ns'] in rollback_set:
KeyError: 'ns'
Any help or clues would be greatly appreciated!
Extra information: Solr 4.9.0 | MongoDB 2.6.3 | mongo-connector 1.2.1

It works as advertised after deleting all the indexes in the data folder, restarting solr and re-running the command with the -n option.

Related

pymongo ServerSelectionTimeoutError

I'm doing some tutorial about docker-compose with flask and mongodb. When I run command "docker-compose up", everything is fine. But when I send a post request to insert a json data to database I have Timeout Error like this:
pymongo.errors.ServerSelectionTimeoutError: localhost:27017: [Errno 111] Connection refused, Timeout: 30s
I guess that I haven't started database yet. But when I try to put command "sudo systemctl stop mongod" in to docker-compose.yml file, I got an error:
db_1 | /usr/local/bin/docker-entrypoint.sh: line 363: exec: sudo: not found
Does anybody know how to fix it?

mongo-conector not connecting with solr - Exception during collection dump

I am connecting MongoDB with solr,
Following this document for integration:
https://blog.toadworld.com/2017/02/03/indexing-mongodb-data-in-apache-solr
DB.Collection: solr.wlslog
D:\path to solr\bin>
mongo-connector --unique-key=id -n solr.wlslog -m localhost:27017 -t http://localhost:8983/solr/wlslog -d solr_doc_manager
I am getting below response and error:
2020-06-15 12:15:45,744 [ALWAYS] mongo_connector.connector:50 - Starting mongo-connector version: 3.1.1
2020-06-15 12:15:45,744 [ALWAYS] mongo_connector.connector:50 - Python version: 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:37:02) [MSC v.1924 64 bit (AMD64)]
2020-06-15 12:15:45,745 [ALWAYS] mongo_connector.connector:50 - Platform: Windows-10-10.0.18362-SP0
2020-06-15 12:15:45,745 [ALWAYS] mongo_connector.connector:50 - pymongo version: 3.10.1
2020-06-15 12:15:45,755 [ALWAYS] mongo_connector.connector:50 - Source MongoDB version: 4.2.2
2020-06-15 12:15:45,755 [ALWAYS] mongo_connector.connector:50 - Target DocManager: mongo_connector.doc_managers.solr_doc_manager version: 0.1.0
2020-06-15 12:15:45,787 [CRITICAL] mongo_connector.oplog_manager:713 - Exception during collection dump
Traceback (most recent call last):
File "C:\Users\ancubate\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\mongo_connector\doc_managers\solr_doc_manager.py", line 292, in
batch = list(next(cleaned) for i in range(self.chunk_size))
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\ancubate\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\mongo_connector\oplog_manager.py", line 668, in do_dump
upsert_all(dm)
File "C:\Users\ancubate\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\mongo_connector\oplog_manager.py", line 651, in upsert_all
dm.bulk_upsert(docs_to_dump(from_coll), mapped_ns, long_ts)
File "C:\Users\ancubate\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\mongo_connector\util.py", line 33, in wrapped
return f(*args, **kwargs)
File "C:\Users\ancubate\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\mongo_connector\doc_managers\solr_doc_manager.py", line 292, in bulk_upsert
batch = list(next(cleaned) for i in range(self.chunk_size))
RuntimeError: generator raised StopIteration
2020-06-15 12:15:45,801 [ERROR] mongo_connector.oplog_manager:723 - OplogThread: Failed during dump collection cannot recover! Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, replicaset='rs0'), 'local'), 'oplog.rs')
2020-06-15 12:15:46,782 [ERROR] mongo_connector.connector:408 - MongoConnector: OplogThread <OplogThread(Thread-2, started 4936)> unexpectedly stopped! Shutting down
I searched over in GitHub issues of mongo-connector but not getting any solutions:
Github-issue-870
Github-issue-898
Finally the issue is resolved :)
My system OS is windows and i have installed mongodb in C:\Program Files\MongoDB\ (system's drive),
Before this mongo-connector connection, i have initiated replica set for mongodb using below command as per this blog:
mongod --port 27017 --dbpath ../data/db --replSet rs0
Problem:
The problem inside the --dbpath ../data/db directory, this directory was located in C:\Program Files\MongoDB\Server\4.2\data\db this directory have all permissions but parent directory C:\Program Files have not all permission because its system's directory and protected directory.
Actual Problem Was: (exception during collection dump)
2020-06-15 12:15:45,787 [CRITICAL] mongo_connector.oplog_manager:713 - Exception during collection dump
Solution:
I have just changed my --dbpath to another path that directory is outside of system's protected directory as below:
mongod --port 27017 --dbpath C:/data/db --replSet rs0
After that i have executed below command for connection, as i posted in my question:
mongo-connector --unique-key=id -n solr.wlslog -m localhost:27017 -t http://localhost:8983/solr/wlslog -d solr_doc_manager
Success mongo connector log result:
2020-06-17 12:08:52,292 [ALWAYS] mongo_connector.connector:50 - Starting mongo-connector version: 3.1.1
2020-06-17 12:08:52,292 [ALWAYS] mongo_connector.connector:50 - Python version: 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:37:02) [MSC v.1924 64 bit (AMD64)]
2020-06-17 12:08:52,293 [ALWAYS] mongo_connector.connector:50 - Platform: Windows-10-10.0.18362-SP0
2020-06-17 12:08:52,293 [ALWAYS] mongo_connector.connector:50 - pymongo version: 3.10.1
2020-06-17 12:08:52,310 [ALWAYS] mongo_connector.connector:50 - Source MongoDB version: 4.2.2
2020-06-17 12:08:52,311 [ALWAYS] mongo_connector.connector:50 - Target DocManager: mongo_connector.doc_managers.solr_doc_manager version: 0.1.0
Hope this answer helpful for everyone :)
in my case, this didn't solve the problem.
I'm using python 3.8, so for me it was actually due to https://docs.python.org/3/whatsnew/3.7.html#changes-in-python-behavior
PEP 479 is enabled for all code in Python 3.7, meaning that
StopIteration exceptions raised directly or indirectly in coroutines
and generators are transformed into RuntimeError exceptions.
(Contributed by Yury Selivanov in bpo-32670.)
reading How yield catches StopIteration exception? led me to think initially it was related to the yield doc statements but actually the problen was 2 lines calling next() in both line 292 and a few lines later in solr_doc_manager.py:
batch = list(next(cleaned) for i in range(self.chunk_size))
changed to:
batch = []
for i in range(self.chunk_size):
for x in cleaned:
batch.append(x)

Mongo connector and Neo4j doc manager shows no graph

I have installed neo4j doc manager as per the document. When I try to sync my mongodb data using the below command it waits infinitely:
Python35-32>mongo-connector -m l
ocalhost:27017 -t http://localhost:7474/db/data -d neo4j_doc_manager
Logging to mongo-connector.log.
The content of mongo-connector.log is as follows:
2016-02-26 19:10:11,809 [ERROR] mongo_connector.doc_managers.neo4j_doc_manager:70 - Bulk
The content of oplog.timestamp is as follows:
["Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, replicaset='myDevReplSet'), 'local'), 'oplog.rs')", 6255589333701492738]
EDIT:
If I initialize the mongo-connector with -v option, the mongo-connector.log file looks like the below:
2016-02-29 15:17:18,964 [INFO] mongo_connector.connector:1040 -
Beginning Mongo Connector 2016-02-29 15:17:19,005 [INFO]
mongo_connector.oplog_manager:89 - OplogThread: Initializing oplog
thread 2016-02-29 15:17:23,060 [INFO] mongo_connector.connector:295 -
MongoConnector: Starting connection thread
MongoClient(host=['localhost:27017'], document_class=dict,
tz_aware=False, connect=True, replicaset='myDevReplSet') 2016-02-29
15:17:23,061 [DEBUG] mongo_connector.oplog_manager:158 - OplogThread:
Run thread started 2016-02-29 15:17:23,061 [DEBUG]
mongo_connector.oplog_manager:160 - OplogThread: Getting cursor
2016-02-29 15:17:23,062 [DEBUG] mongo_connector.oplog_manager:670 -
OplogThread: reading last checkpoint as Timestamp(1456492891, 2)
2016-02-29 15:17:23,062 [DEBUG] mongo_connector.oplog_manager:654 -
OplogThread: oplog checkpoint updated to Timestamp(1456492891, 2)
2016-02-29 15:17:23,068 [DEBUG] mongo_connector.oplog_manager:178 -
OplogThread: Got the cursor, count is 1 2016-02-29 15:17:23,069
[DEBUG] mongo_connector.oplog_manager:185 - OplogThread: about to
process new oplog entries 2016-02-29 15:17:23,069 [DEBUG]
mongo_connector.oplog_manager:188 - OplogThread: Cursor is still alive
and thread is still running. 2016-02-29 15:17:23,069 [DEBUG]
mongo_connector.oplog_manager:194 - OplogThread: Iterating through
cursor, document number in this cursor is 0 2016-02-29 15:17:24,094
[DEBUG] mongo_connector.oplog_manager:188 - OplogThread: Cursor is
still alive and thread is still running. 2016-02-29 15:17:25,095
[DEBUG] mongo_connector.oplog_manager:188 - OplogThread: Cursor is
still alive and thread is still running. 2016-02-29 15:17:26,105
[DEBUG] mongo_connector.oplog_manager:188 - OplogThread: Cursor is
still alive and thread is still running. 2016-02-29 15:17:27,107
[DEBUG] mongo_connector.oplog_manager:188 - OplogThread: Cursor is
still running.
Nothing went wrong with my installation.
The data added to mongodb after the mongo-connector service started is automatically displayed in Neo4j. The data which are available in Mongodb before the service is started cannot be loaded into Neo4j.

iRODS configuration - Could not start iRODS server during

I've installed postgres as database and then iRODS in Ubuntu 14.04. Then I start its configuration
sudo /var/lib/irods/packaging/setup_irods.sh
After the configuration phase, when iRODS starts the updtating, the first 4 steps go well
Stopping iRODS server...
-----------------------------
Running irods_setup.pl...
Step 1 of 4: Configuring database user...
Updating user's .pgpass...
Skipped. File already uptodate.
Step 2 of 4: Creating database and tables...
Checking whether iCAT database exists...
[mydb] on [localhost] found.
Updating user's .odbc.ini...
Creating iCAT tables...
Skipped. Tables already created.
Testing database communications...
Step 3 of 4: Configuring iRODS server...
Updating /etc/irods/server_config.json...
Updating /etc/irods/database_config.json...
Step 4 of 4: Configuring iRODS user and starting server...
Updating iRODS user's ~/.irods/irods_environment.json...
Starting iRODS server...
but at the end I get this error
Could not start iRODS server.
Starting iRODS server...
Traceback (most recent call last):
File "/var/lib/irods/iRODS/scripts/python/get_db_schema_version.py", line 77, in <module>
current_schema_version = get_current_schema_version(cfg)
File "/var/lib/irods/iRODS/scripts/python/get_db_schema_version.py", line 61, in get_current_schema_version
'get_current_schema_version: failed to find result line for schema_version\n\n{}'.format(format_cmd_result(result)))
RuntimeError: get_current_schema_version: failed to find result line for schema_version
return code: [0]
stdout:
stderr:
ERROR: relation "r_grid_configuration" does not exist
LINE 1: ...option_value from R_GRID_CON...
^
Confirming catalog_schema_version... Success
Validating [/var/lib/irods/.irods/irods_environment.json]... Success
Validating [/etc/irods/server_config.json]... Success
Validating [/etc/irods/hosts_config.json]... Success
Validating [/etc/irods/host_access_control_config.json]... Success
Validating [/etc/irods/database_config.json]... Success
(1) Waiting for process bound to port 5432 ... [-]
(2) Waiting for process bound to port 5432 ... [-]
(4) Waiting for process bound to port 5432 ... [-]
Port 5432 In Use ... Not Starting iRODS Server
Install problem:
Cannot start iRODS server.
Found 0 processes:
There are no iRODS servers running.
Abort.
Have you any ideas on what went wrong?
Because I don't have enough reputation to comment:
Which version of iRODS are you using?
This portion of the output:
Creating iCAT tables...
Skipped. Tables already created.
combined with this portion:
ERROR: relation "r_grid_configuration" does not exist
suggests that the setup ran before, but only partially completed, leaving the system in a broken state. I would recommend reinstallating from scratch, which includes:
Uninstalling the iRODS icat and db plugin packages:
sudo dpkg -P irods-icat irods-database-plugin-postgres
note: make sure to use the -P, so that the configuration files are removed from dpkg's database.
Dropping and remaking the database
Deleting the following directories:
sudo rm -rf /tmp/irods /etc/irods /var/lib/irods
Reinstalling the packages and running sudo /var/lib/irods/packaging/setup_irods.sh
This portion of the output:
(1) Waiting for process bound to port 5432 ... [-]
(2) Waiting for process bound to port 5432 ... [-]
(4) Waiting for process bound to port 5432 ... [-]
Port 5432 In Use ... Not Starting iRODS Server
suggests that you are using port 5432 as your iRODS server port. This will conflict with the default Postgres port. I recommend using the default iRODS server port of 1247. This value was queried during setup as:
iRODS server's port [1247]:
and is recorded in /etc/irods/server_config.json under the zone_port entry.
iRODS-Chat:
It may be easier to continue this on the iRODS-Chat google group. Repairing installs can require back-and-forth communication, which may not be inline with standard stackoverflow usage.

mongo-connector not working with --unique-key

mongo-connector --unique-key=id --auto-commit-interval=0 -m localhost:27017 -t http://localhost:8983/solr -d /Library/Python/2.7/site-packages/mongo_connector/doc_managers/solr_doc_manager.py --admin-username admin --password bypass
I'm using the following to connect between MongoDB and Apache Solr but I'm getting the following error at the end:
2014-05-17 12:38:20,607 - INFO - Beginning Mongo Connector
2014-05-17 12:38:22,200 - INFO - Starting new HTTP connection (1): localhost
2014-05-17 12:38:22,439 - INFO - Finished 'http://localhost:8983/solr/admin/luke?show=schema&wt=json' (get) with body '' in 0.404 seconds.
2014-05-17 12:38:22,527 - INFO - OplogThread: Initializing oplog thread
2014-05-17 12:38:22,580 - INFO - MongoConnector: Starting connection thread MongoClient('localhost', 27017)
Exception in thread Thread-2:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 808, in __bootstrap_inner
self.run()
File "/Library/Python/2.7/site-packages/mongo_connector/oplog_manager.py", line 181, in run
dm.remove(entry)
File "/Library/Python/2.7/site-packages/mongo_connector/doc_managers/solr_doc_manager.py", line 192, in remove
self.solr.delete(id=str(doc[self.unique_key]),
KeyError: 'id'
Please help me.
Did you explicitly create a unique field called "id"? That is not the default field added by MongoDB, the default unique field is "_id".