unable to run mongo-connector - mongodb

I have installed mongo-connector in the mongodb server.
I am executing by giving the command
mongo-connector -m [remote mongo server IP]:[remote mongo server port] -t [elastic search server IP]:[elastic search server Port] -d elastic_doc_manager.py
I also tried with this since mongo is running in the same server with the default port.
mongo-connector -t [elastic search server IP]:[elastic search server Port] -d elastic_doc_manager.py
I am getting error
Traceback (most recent call last):
File "/usr/local/bin/mongo-connector", line 9, in <module>
load_entry_point('mongo-connector==2.3.dev0', 'console_scripts', 'mongo-connector')()
File "/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/util.py", line 85, in wrapped
func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/connector.py", line 1037, in main
conf.parse_args()
File "/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/config.py", line 118, in parse_args
option, dict((k, values.get(k)) for k in option.cli_names))
File "/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/connector.py", line 820, in apply_doc_managers
module = import_dm_by_name(dm['docManager'])
File "/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/connector.py", line 810, in import_dm_by_name
"Could not import %s." % full_name)
**mongo_connector.errors.InvalidConfiguration: Could not import mongo_connector.doc_managers.elastic_doc_manager.py.**
NOTE: I am using python2.7
and mongo-connector 2.3
Elastic search server is 2.2
Any suggestions ?
[edit]
After applying Val's suggestion:
2016-02-29 19:56:59,519 [CRITICAL] mongo_connector.oplog_manager:549 -
Exception during collection dump
Traceback (most recent call last):
File
"/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/oplog_manager.py",
line 501, in do_dump
upsert_all(dm)
File
"/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/oplog_manager.py",
line 485, in upsert_all dm.bulk_upsert(docs_to_dump(namespace),
mapped_ns, long_ts)
File
"/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/util.py", line 32, in wrapped
return f(*args, **kwargs)
File
"/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/doc_managers/elastic_doc_manager.py", line 190, in bulk_upsert
for ok, resp in responses:
File
"/usr/local/lib/python2.7/dist-packages/elasticsearch-1.9.0-py2.7.egg/elasticsearch/helpers/init.py",
line 160, in streaming_bulk
for result in _process_bulk_chunk(client, bulk_actions,
raise_on_exception, raise_on_error, **kwargs):
File
"/usr/local/lib/python2.7/dist-packages/elasticsearch-1.9.0-py2.7.egg/elasticsearch/helpers/init.py",
line 132, in _process_bulk_chunk
raise BulkIndexError('%i document(s) failed to index.' % len(errors),
errors)
BulkIndexError: (u'2 document(s) failed to
index.',..document_class=dict, tz_aware=False, connect=True,
replicaset=u'mss'), u'local'), u'oplog.rs')
2016-02-29 19:56:59,835 [ERROR] mongo_connector.connector:302 -
MongoConnector: OplogThread unexpectedly stopped! Shutting down
Hi Val,
I connected with another mongodb instance, which had only one database, having one collection with 30,000+ records and I was able to execute it succesfully. The previous mongodb collection has multiple databases (around 7), which internally had multiple collections (around 5 to 15 per databases) and all were having good amount of documents (ranging from 500 to 50,000) in the collections.
Was Mongo-connector failing because of huge data residing in the mongo database ?
I have further queries
a. Is is possible to get indexing done of only specific collections in the mongodb, residing in different databases? I wan to index only specific collections (not the entire database). How can I achieve this ?
b. In elasticsearch i can see duplicate indexes for one collection. First one is with the database name (as expected), other one with the name mongodb_meta, both of them having same data, if I am changing the collection, the update is happening in both the collections.
c. Is it possible to configure the output index name or any other parameters any how?

I think the only issue is that you have the .py extension on the doc manager (it was needed before mongo-connector 2.0), you simply need to remove it:
mongo-connector -m [remote mongo server IP]:[remote mongo server port] -t [elastic search server IP]:[elastic search server Port] -d elastic_doc_manager

I found this option to run specific collection only.
$ mongo-connector -m mongodbserver:27017 -t elasticserver:9200 -d elastic_doc_manager --oplog-ts oplogstatus.txt --namespace-set database.collection

It started working after giving below command with --oplog-ts option.
mongo-connector -m localhost:27017 -t localhost:37017 -d mongo_doc_manager --oplog-ts oplogstatus.txt
But its failing if i use a config file. Kindly advise how to resolve this issue.
C:\Dev\mongodb\mongo-connector>mongo-connector -c myconfig.json --oplog-ts oplogstatus.txt
Fatal Exception
Traceback (most recent call last):
File "C:\Program Files\Python\lib\site-packages\mongo_connector-2.5.0.dev0-py3.6.egg\mongo_connector\config.py", line 110, in parse_args
self.load_json(f.read())
File "C:\Program Files\Python\lib\site-packages\mongo_connector-2.5.0.dev0-py3.6.egg\mongo_connector\config.py", line 132, in load_json
parsed_config = json.loads(text)
File "C:\Program Files\Python\lib\json\__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "C:\Program Files\Python\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Program Files\Python\lib\json\decoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid \escape: line 6 column 21 (char 201)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Program Files\Python\lib\site-packages\mongo_connector-2.5.0.dev0-py3.6.egg\mongo_connector\util.py", line 90, in wrapped
func(*args, **kwargs)
File "C:\Program Files\Python\lib\site-packages\mongo_connector-2.5.0.dev0-py3.6.egg\mongo_connector\connector.py", line 1059, in main
conf.parse_args()
File "C:\Program Files\Python\lib\site-packages\mongo_connector-2.5.0.dev0-py3.6.egg\mongo_connector\config.py", line 112, in parse_args
reraise(errors.InvalidConfiguration, *sys.exc_info()[1:])
File "C:\Program Files\Python\lib\site-packages\mongo_connector-2.5.0.dev0-py3.6.egg\mongo_connector\compat.py", line 9, in reraise
raise exctype(str(value)).with_traceback(trace)
File "C:\Program Files\Python\lib\site-packages\mongo_connector-2.5.0.dev0-py3.6.egg\mongo_connector\config.py", line 110, in parse_args
self.load_json(f.read())
File "C:\Program Files\Python\lib\site-packages\mongo_connector-2.5.0.dev0-py3.6.egg\mongo_connector\config.py", line 132, in load_json
parsed_config = json.loads(text)
File "C:\Program Files\Python\lib\json\__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "C:\Program Files\Python\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Program Files\Python\lib\json\decoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
mongo_connector.errors.InvalidConfiguration: Invalid \escape: line 6 column 21 (char 201)

Try this.
pip install 'elastic2-doc-manager[elastic5]'
mongo-connector -m localhost:27017 -t localhost:9200 -d elastic2_doc_manager

answer on github
Your strategy seems sound to me. Here's how to do this:
Generate a mongo-connector timestamp file:
Run mongo-connector --no-dump.
Stop mongo-connector right after it starts up. Now you have an
oplog.timestamp file pointing to the latest entry on the oplog.
Run mongodump on the primary. The dump already reflects all the
changes that mongo-connector saw in the oplog.
Run mongorestore with the dump from (2) on the target MongoDB.
Restart mongo-connector. Pass in the file generated in (1) to the
--oplog-ts option.
I'll add this to the wiki.

Related

Overwrite anaconda3 unsuccessful installation

I deleted the anaconda directory under the home and bashrc configurations.
Now, I need to install it again, but it occurs a problem evenif overwrites unsuccessful installation on Linux.
Should I delete some additional config files? How can I handle this?
sh Downloads/Anaconda3-2022.10-Linux-x86_64.sh -u -p /home/user/anaconda3/
PREFIX=/home/user/anaconda3
Unpacking payload ...
concurrent.futures.process._RemoteTraceback:
'''
Traceback (most recent call last):
File "concurrent/futures/process.py", line 384, in wait_result_broken_or_wakeup
File "multiprocessing/connection.py", line 256, in recv
TypeError: __init__() missing 1 required positional argument: 'msg'
'''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "entry_point.py", line 69, in <module>
File "concurrent/futures/process.py", line 559, in _chain_from_iterable_of_lists
File "concurrent/futures/_base.py", line 608, in result_iterator
File "concurrent/futures/_base.py", line 445, in the result
File "concurrent/futures/_base.py", line 390, in __get_result
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
[8382] Failed to execute script entry_point
Make sure deleted .conda directory under home and have enough disk space.
No need to delete .cache or any bin libraries.

While taking database backup Odoo11 causes Error

While taking backup from odoo11, gives error. How to solve this?
Database backup error: Postgres subprocess ('/usr/bin/pg_dump', '--no-owner', '--file=/tmp/tmpa36uaqdp/dump.sql', 'simple_25_10_19') error 1
This error occurs when your PostgreSQL client and server versions do not match. Check your versions.
More info for Postgres versions in the docker setup can be found here odoo12 database backup no owner?.
I had Odoo 12CE on Amazon AWS. The problem is the HDD was full.
[
Enjoy!!
2020-05-24 14:21:44,230 1280 INFO xxxxx.mx odoo.service.db: DUMP DB: xxxx.mx format zip
2020-05-24 14:23:22,250 1280 ERROR xxxx.mx odoo.addons.web.controllers.main:Database.backup
Traceback (most recent call last):
File "/opt/odoosrc/12.0/odoo/addons/web/controllers/main.py", line 758, in backup
dump_stream = odoo.service.db.dump_db(name, None, backup_format)
File "<decorator-gen-9>", line 2, in dump_db
File "/opt/odoosrc/12.0/odoo/odoo/service/db.py", line 40, in if_db_mgt_enabled
return method(self, *args, **kwargs)
File "/opt/odoosrc/12.0/odoo/odoo/service/db.py", line 225, in dump_db
odoo.tools.exec_pg_command(*cmd)
File "/opt/odoosrc/12.0/odoo/odoo/tools/misc.py", line 129, in exec_pg_command
raise Exception('Postgres subprocess %s error %s' % (args2, rc))
Exception: Postgres subprocess ('/usr/bin/pg_dump', '--no-owner', '--file=/tmp/tmppiurb5iy/dump.sql', 'solidaridad.dri.com.mx') error 1

Can't connect python with titan db

Following the steps to configure the titan srever
bin/titan.sh
Forking Cassandra...
Running `nodetool statusthrift`... OK (returned exit status 0 and printed string "running").
Forking Elasticsearch...
Connecting to Elasticsearch (127.0.0.1:9300)... OK (connected to 127.0.0.1:9300).
Forking Gremlin-Server...
Connecting to Gremlin-Server (127.0.0.1:8182)... OK (connected to 127.0.0.1:8182).
Run gremlin.sh to connect.
The server started perfectly but when i am connecting with python and then run the script the error which i got mentioned below
Traceback (most recent call last):
File "/home/admin-12/Documents/bitbucket/ecodrone/ecodrone/GremlinConnector.py", line 28, in <module>
data = (execute_query("""g.V()"""))
File "/home/admin-12/Documents/bitbucket/ecodrone/ecodrone/GremlinConnector.py", line 22, in execute_query
results = future_results.result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/admin-12/.local/lib/python3.6/site-packages/gremlin_python/driver/resultset.py", line 81, in cb
f.result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/admin-12/.local/lib/python3.6/site-packages/gremlin_python/driver/connection.py", line 77, in _receive
self._protocol.data_received(data, self._results)
File "/home/admin-12/.local/lib/python3.6/site-packages/gremlin_python/driver/protocol.py", line 71, in data_received
result_set = results_dict[request_id]
KeyError: None
versioning i am using
titan - 1.0.0
gremlin-python - 3.3.2
apache-tinkerpop-gremlin-server-3.3.1
Titan supports some an extremely old version of TinkerPop and I'm sure you'll find some incompatibility there if you try to use gremlin-python 3.3.2. As Titan is no longer supported, I suggest you upgrade to JanusGraph, a more current and maintained version of Titan.

Snakemake and cloud formation cluster error with local scratch space

I am having a problem using local scratch space on cfncluster and snakemake at the same time. My strategy is to write data to local scratch for each node in the cluster and then move the data to the NFS partition.
Unfortunately I am getting the following error:
snakemake 4.0.0, cfncluster
/shared/bin/bin/snakemake --rerun-incomplete -s /shared/scripts/sra_to_fa_cluster.py -j 1 -p --latency-wait 20 -k -c " qsub -cwd -V" -F
/shared/dbGAP/sra_toolkit/sratoolkit.2.8.2-1-ubuntu64/bin/fastq-dump --split-files --gzip --outdir /scratch/ /shared/dbGAP/sras2/test/SRR2135300.sra
Waiting at most 20 seconds for missing files.
Exception in thread Thread-1:
Traceback (most recent call last):
File "/shared/bin/lib/python3.6/site-packages/snakemake/dag.py", line 319, in check_and_touch_output
wait_for_files(expanded_output, latency_wait=wait)
File "/shared/bin/lib/python3.6/site-packages/snakemake/io.py", line 395, in wait_for_files
latency_wait, "\n".join(get_missing())))
OSError: Missing files after 20 seconds:
/scratch/SRR2135300_2.fastq.gz
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/shared/bin/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/shared/bin/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/shared/bin/lib/python3.6/site-packages/snakemake/executors.py", line 647, in _wait_for_jobs
active_job.callback(active_job.job)
File "/shared/bin/lib/python3.6/site-packages/snakemake/scheduler.py", line 287, in _proceed
self.get_executor(job).handle_job_success(job)
File "/shared/bin/lib/python3.6/site-packages/snakemake/executors.py", line 549, in handle_job_success
super().handle_job_success(job, upload_remote=False)
File "/shared/bin/lib/python3.6/site-packages/snakemake/executors.py", line 178, in handle_job_success
ignore_missing_output=ignore_missing_output)
File "/shared/bin/lib/python3.6/site-packages/snakemake/dag.py", line 323, in check_and_touch_output
"wait time with --latency-wait.", rule=job.rule)
snakemake.exceptions.MissingOutputException: Missing files after 20 seconds:
/scratch/SRR2135300_2.fastq.gz
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
This is similar to the error reported here:
https://bitbucket.org/snakemake/snakemake/issues/462/unhandled-missingoutputexception-in
Snakemake script is as follows:
rule all:
input:expand("/shared/dbGAP/sras2/fastq.gz/{sample}_{end}.fastq.gz",
sample=SAMPLES, end=END)
rule move:
input: left="/scratch/{sample}_1.fastq.gz", right="/scratch/{sample}_2.fastq.gz"
output: left="/shared/dbGAP/sras2/fastq.gz/{sample}_1.fastq.gz", right="/shared/dbGAP/sras2/fastq.gz/{sample}_2.fastq.gz"
shell: "rsync --remove-source-files -av {input.left} {output.left}; rsync --remove-source-files -av {input.right} {output.right};"
rule get_fastq_files_from_sra_file:
input: sras="/shared/dbGAP/sras2/test/{sample}.sra"
output: left="/scratch/{sample}_1.fastq.gz", right="/scratch/{sample}_2.fastq.gz"
shell: "/shared/dbGAP/sra_toolkit/sratoolkit.2.8.2-1-ubuntu64/bin/fastq-dump --split-files --gzip --outdir /scratch/ {input}"
My feeling is that snakemake cannot "see" the scratch on the nodes, so it returns it as missing, but I am not sure how to solve this issue.

How to connect to remote MongoDB with mongo-connector?

How can I connect to a MongoDB cluster on Mongo Atlas using mongo-connector?
I have tried to connector to my cluster with the following commands:
First attempt
sudo mongo-connector -m "mongodb://g******:*********#rest-api-data-shard-00-00-xemv3.mongodb.net:27017,rest-api-data-shard-00-01-xemv3.mongodb.net:27017,rest-api-data-shard-00-02-xemv3.mongodb.net:27017/admin?ssl
=true&replicaSet=rest-api-data-shard-0&authSource=admin" -a g****** -p "***********" -t http://localhost:9200 -d elastic2_doc_manager
Response:
Logging to mongo-connector.log.
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/local/lib/python2.7/site-packages/mongo_connector/util.py", line 90, in wrapped
func(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/mongo_connector/connector.py", line 263, in run
main_conn['admin'].authenticate(self.auth_username, self.auth_key)
File "/usr/local/lib/python2.7/site-packages/pymongo/database.py", line 1018, in authenticate
connect=True)
File "/usr/local/lib/python2.7/site-packages/pymongo/mongo_client.py", line 434, in _cache_credentials
raise OperationFailure('Another user is already authenticated '
OperationFailure: Another user is already authenticated to this database. You must logout first.
Second attempt:
sudo mongo-connector -m "mongodb://rest-api-data-shard-00-00-xemv3.mongodb.net:27017,rest-api-data-shard-00-01-xemv3.mongodb.net:27017,rest-api-data-shard-00-02-xemv3.mongodb.net:27017/admin?replicaSet=rest-api-data-shard-0" -a g********* -p "********" -t http://localhost:9200 -d elastic2_doc_manager
Response:
Logging to mongo-connector.log.
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/local/lib/python2.7/site-packages/mongo_connector/util.py", line 90, in wrapped
func(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/mongo_connector/connector.py", line 263, in run
main_conn['admin'].authenticate(self.auth_username, self.auth_key)
File "/usr/local/lib/python2.7/site-packages/pymongo/database.py", line 1018, in authenticate
connect=True)
File "/usr/local/lib/python2.7/site-packages/pymongo/mongo_client.py", line 439, in _cache_credentials
writable_preferred_server_selector)
File "/usr/local/lib/python2.7/site-packages/pymongo/topology.py", line 210, in select_server
address))
File "/usr/local/lib/python2.7/site-packages/pymongo/topology.py", line 186, in select_servers
self._error_message(selector))
ServerSelectionTimeoutError: rest-api-data-shard-00-02-xemv3.mongodb.net:27017: [Errno 54] Connection reset by peer,rest-api-data-shard-00-00-xemv3.mongodb.net:27017: [Errno 54] Connection reset by peer,rest-api-data-shard-00-01-xemv3.mongodb.net:27017: [Errno 54] Connection reset by peer
Answered on the github issue. Solution:
In your first attempt, the problem is that you are specifying the username and password for MongoDB twice. Remove the -a g****** -p "***********" and it should work fine. If you need to authenticate to Elasticsearch you need to use a mongo-connector config file and set the correct authentication options for the Python Elasticsearch client, eg:
{
"mainAddress": "mongodb://user:pass#mongodb:27017,mongodb:27018,mongodb:27019/admin?ssl=true&replicaSet=name&authSource=admin",
"verbosity": 1,
"docManagers": [
{
"docManager": "elastic2_doc_manager",
"targetURL": "http://localhost:9200",
"args": {
"clientOptions": {
"http_auth": ["user", "secret"],
"use_ssl": true
}
}
}
]
}
In your second attempt, it looks like the problem is that you forgot to add ssl=true to the MongoDB connection string. That's why you're getting Connection reset by peer errors.