Snakemake and cloud formation cluster error with local scratch space

Snakemake and cloud formation cluster error with local scratch space - aws-cloudformation

I am having a problem using local scratch space on cfncluster and snakemake at the same time. My strategy is to write data to local scratch for each node in the cluster and then move the data to the NFS partition.
Unfortunately I am getting the following error:
snakemake 4.0.0, cfncluster
/shared/bin/bin/snakemake --rerun-incomplete -s /shared/scripts/sra_to_fa_cluster.py -j 1 -p --latency-wait 20 -k -c " qsub -cwd -V" -F
/shared/dbGAP/sra_toolkit/sratoolkit.2.8.2-1-ubuntu64/bin/fastq-dump --split-files --gzip --outdir /scratch/ /shared/dbGAP/sras2/test/SRR2135300.sra
Waiting at most 20 seconds for missing files.
Exception in thread Thread-1:
Traceback (most recent call last):
File "/shared/bin/lib/python3.6/site-packages/snakemake/dag.py", line 319, in check_and_touch_output
wait_for_files(expanded_output, latency_wait=wait)
File "/shared/bin/lib/python3.6/site-packages/snakemake/io.py", line 395, in wait_for_files
latency_wait, "\n".join(get_missing())))
OSError: Missing files after 20 seconds:
/scratch/SRR2135300_2.fastq.gz
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/shared/bin/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/shared/bin/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/shared/bin/lib/python3.6/site-packages/snakemake/executors.py", line 647, in _wait_for_jobs
active_job.callback(active_job.job)
File "/shared/bin/lib/python3.6/site-packages/snakemake/scheduler.py", line 287, in _proceed
self.get_executor(job).handle_job_success(job)
File "/shared/bin/lib/python3.6/site-packages/snakemake/executors.py", line 549, in handle_job_success
super().handle_job_success(job, upload_remote=False)
File "/shared/bin/lib/python3.6/site-packages/snakemake/executors.py", line 178, in handle_job_success
ignore_missing_output=ignore_missing_output)
File "/shared/bin/lib/python3.6/site-packages/snakemake/dag.py", line 323, in check_and_touch_output
"wait time with --latency-wait.", rule=job.rule)
snakemake.exceptions.MissingOutputException: Missing files after 20 seconds:
/scratch/SRR2135300_2.fastq.gz
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
This is similar to the error reported here:
https://bitbucket.org/snakemake/snakemake/issues/462/unhandled-missingoutputexception-in
Snakemake script is as follows:
rule all:
input:expand("/shared/dbGAP/sras2/fastq.gz/{sample}_{end}.fastq.gz",
sample=SAMPLES, end=END)
rule move:
input: left="/scratch/{sample}_1.fastq.gz", right="/scratch/{sample}_2.fastq.gz"
output: left="/shared/dbGAP/sras2/fastq.gz/{sample}_1.fastq.gz", right="/shared/dbGAP/sras2/fastq.gz/{sample}_2.fastq.gz"
shell: "rsync --remove-source-files -av {input.left} {output.left}; rsync --remove-source-files -av {input.right} {output.right};"
rule get_fastq_files_from_sra_file:
input: sras="/shared/dbGAP/sras2/test/{sample}.sra"
output: left="/scratch/{sample}_1.fastq.gz", right="/scratch/{sample}_2.fastq.gz"
shell: "/shared/dbGAP/sra_toolkit/sratoolkit.2.8.2-1-ubuntu64/bin/fastq-dump --split-files --gzip --outdir /scratch/ {input}"
My feeling is that snakemake cannot "see" the scratch on the nodes, so it returns it as missing, but I am not sure how to solve this issue.

Related

Overwrite anaconda3 unsuccessful installation

I deleted the anaconda directory under the home and bashrc configurations.
Now, I need to install it again, but it occurs a problem evenif overwrites unsuccessful installation on Linux.
Should I delete some additional config files? How can I handle this?
sh Downloads/Anaconda3-2022.10-Linux-x86_64.sh -u -p /home/user/anaconda3/
PREFIX=/home/user/anaconda3
Unpacking payload ...
concurrent.futures.process._RemoteTraceback:
'''
Traceback (most recent call last):
File "concurrent/futures/process.py", line 384, in wait_result_broken_or_wakeup
File "multiprocessing/connection.py", line 256, in recv
TypeError: __init__() missing 1 required positional argument: 'msg'
'''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "entry_point.py", line 69, in <module>
File "concurrent/futures/process.py", line 559, in _chain_from_iterable_of_lists
File "concurrent/futures/_base.py", line 608, in result_iterator
File "concurrent/futures/_base.py", line 445, in the result
File "concurrent/futures/_base.py", line 390, in __get_result
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
[8382] Failed to execute script entry_point

Make sure deleted .conda directory under home and have enough disk space.
No need to delete .cache or any bin libraries.

snakemake fails due to jobscript not found

I'm running snakemake on fairly large workflows. Somewhat randomly I get errors like
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/prog/Python/3.7.9-foss-2018a/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/prog/Python/3.7.9-foss-2018a/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "<home>/.pip/CentOS/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 1069, in _wait_for_jobs
status = job_status(active_job)
File "<home>/.pip/CentOS/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 1051, in job_status
os.remove(active_job.jobscript)<current working dir>/.snakemake/tmp.0w7jh5bc/snakejob.<name of rule>.6868.sh'
It happens somewhat randomly and restarting the workflow usually resolves the problem. I think this might be due to filesystem latency, however the latency-wait flag seems to work only on output files. Is there a way to make snakemake wait for jobscripts as well?

bitbake fails at the simplest recipe

Just installed Yocto. On a morty branch. Executed the following commands:
cd poky
source oe-init-build-env build-qemuarm
In conf/local.conf changed the name of the machine to MACHINE ?= "qemuarm"
Then executed the following:
$ bitbake core-image-minimal
Loading cache: 100% |##########################################################################################################| Time: 0:00:00
Loaded 1320 entries from dependency cache.
ERROR: Execution of event handler 'sstate_eventhandler2' failed
Traceback (most recent call last):
File "/home/some-user/projects/melp/poky/meta/classes/sstate.bbclass", line 1015, in sstate_eventhandler2(e=<bb.event.ReachableStamps object at 0x7fbc17f2e0f0>):
for l in lines:
> (stamp, manifest, workdir) = l.split()
if stamp not in stamps:
ValueError: not enough values to unpack (expected 3, got 1)
ERROR: Command execution failed: Traceback (most recent call last):
File "/home/some-user/projects/melp/poky/bitbake/lib/bb/command.py", line 101, in runAsyncCommand
self.cooker.updateCache()
File "/home/some-user/projects/melp/poky/bitbake/lib/bb/cooker.py", line 1658, in updateCache
bb.event.fire(event, self.databuilder.mcdata[mc])
File "/home/some-user/projects/melp/poky/bitbake/lib/bb/event.py", line 201, in fire
fire_class_handlers(event, d)
File "/home/some-user/projects/melp/poky/bitbake/lib/bb/event.py", line 124, in fire_class_handlers
execute_handler(name, handler, event, d)
File "/home/some-user/projects/melp/poky/bitbake/lib/bb/event.py", line 96, in execute_handler
ret = handler(event)
File "/home/some-user/projects/melp/poky/meta/classes/sstate.bbclass", line 1015, in sstate_eventhandler2
(stamp, manifest, workdir) = l.split()
ValueError: not enough values to unpack (expected 3, got 1)
It looks like it is a python error. Does anyone know what is the issue? Am I using the wrong version?
Here is the output of python --version
$ python --version
Python 2.7.12
What am I doing wrong?

You realise that Morty is 18 months old and in a few weeks will be longer supported right?
Anyway, looks like the sstate-cache/ somehow is corrupted. Delete your tmp/ and sstate-cache/ directories and try again.

unable to run mongo-connector

I have installed mongo-connector in the mongodb server.
I am executing by giving the command
mongo-connector -m [remote mongo server IP]:[remote mongo server port] -t [elastic search server IP]:[elastic search server Port] -d elastic_doc_manager.py
I also tried with this since mongo is running in the same server with the default port.
mongo-connector -t [elastic search server IP]:[elastic search server Port] -d elastic_doc_manager.py
I am getting error
Traceback (most recent call last):
File "/usr/local/bin/mongo-connector", line 9, in <module>
load_entry_point('mongo-connector==2.3.dev0', 'console_scripts', 'mongo-connector')()
File "/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/util.py", line 85, in wrapped
func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/connector.py", line 1037, in main
conf.parse_args()
File "/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/config.py", line 118, in parse_args
option, dict((k, values.get(k)) for k in option.cli_names))
File "/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/connector.py", line 820, in apply_doc_managers
module = import_dm_by_name(dm['docManager'])
File "/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/connector.py", line 810, in import_dm_by_name
"Could not import %s." % full_name)
**mongo_connector.errors.InvalidConfiguration: Could not import mongo_connector.doc_managers.elastic_doc_manager.py.**
NOTE: I am using python2.7
and mongo-connector 2.3
Elastic search server is 2.2
Any suggestions ?
[edit]
After applying Val's suggestion:
2016-02-29 19:56:59,519 [CRITICAL] mongo_connector.oplog_manager:549 -
Exception during collection dump
Traceback (most recent call last):
File
"/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/oplog_manager.py",
line 501, in do_dump
upsert_all(dm)
File
"/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/oplog_manager.py",
line 485, in upsert_all dm.bulk_upsert(docs_to_dump(namespace),
mapped_ns, long_ts)
File
"/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/util.py", line 32, in wrapped
return f(*args, **kwargs)
File
"/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/doc_managers/elastic_doc_manager.py", line 190, in bulk_upsert
for ok, resp in responses:
File
"/usr/local/lib/python2.7/dist-packages/elasticsearch-1.9.0-py2.7.egg/elasticsearch/helpers/init.py",
line 160, in streaming_bulk
for result in _process_bulk_chunk(client, bulk_actions,
raise_on_exception, raise_on_error, **kwargs):
File
"/usr/local/lib/python2.7/dist-packages/elasticsearch-1.9.0-py2.7.egg/elasticsearch/helpers/init.py",
line 132, in _process_bulk_chunk
raise BulkIndexError('%i document(s) failed to index.' % len(errors),
errors)
BulkIndexError: (u'2 document(s) failed to
index.',..document_class=dict, tz_aware=False, connect=True,
replicaset=u'mss'), u'local'), u'oplog.rs')
2016-02-29 19:56:59,835 [ERROR] mongo_connector.connector:302 -
MongoConnector: OplogThread unexpectedly stopped! Shutting down
Hi Val,
I connected with another mongodb instance, which had only one database, having one collection with 30,000+ records and I was able to execute it succesfully. The previous mongodb collection has multiple databases (around 7), which internally had multiple collections (around 5 to 15 per databases) and all were having good amount of documents (ranging from 500 to 50,000) in the collections.
Was Mongo-connector failing because of huge data residing in the mongo database ?
I have further queries
a. Is is possible to get indexing done of only specific collections in the mongodb, residing in different databases? I wan to index only specific collections (not the entire database). How can I achieve this ?
b. In elasticsearch i can see duplicate indexes for one collection. First one is with the database name (as expected), other one with the name mongodb_meta, both of them having same data, if I am changing the collection, the update is happening in both the collections.
c. Is it possible to configure the output index name or any other parameters any how?

I think the only issue is that you have the .py extension on the doc manager (it was needed before mongo-connector 2.0), you simply need to remove it:
mongo-connector -m [remote mongo server IP]:[remote mongo server port] -t [elastic search server IP]:[elastic search server Port] -d elastic_doc_manager

I found this option to run specific collection only.
$ mongo-connector -m mongodbserver:27017 -t elasticserver:9200 -d elastic_doc_manager --oplog-ts oplogstatus.txt --namespace-set database.collection

It started working after giving below command with --oplog-ts option.
mongo-connector -m localhost:27017 -t localhost:37017 -d mongo_doc_manager --oplog-ts oplogstatus.txt
But its failing if i use a config file. Kindly advise how to resolve this issue.
C:\Dev\mongodb\mongo-connector>mongo-connector -c myconfig.json --oplog-ts oplogstatus.txt
Fatal Exception
Traceback (most recent call last):
File "C:\Program Files\Python\lib\site-packages\mongo_connector-2.5.0.dev0-py3.6.egg\mongo_connector\config.py", line 110, in parse_args
self.load_json(f.read())
File "C:\Program Files\Python\lib\site-packages\mongo_connector-2.5.0.dev0-py3.6.egg\mongo_connector\config.py", line 132, in load_json
parsed_config = json.loads(text)
File "C:\Program Files\Python\lib\json\__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "C:\Program Files\Python\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Program Files\Python\lib\json\decoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid \escape: line 6 column 21 (char 201)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Program Files\Python\lib\site-packages\mongo_connector-2.5.0.dev0-py3.6.egg\mongo_connector\util.py", line 90, in wrapped
func(*args, **kwargs)
File "C:\Program Files\Python\lib\site-packages\mongo_connector-2.5.0.dev0-py3.6.egg\mongo_connector\connector.py", line 1059, in main
conf.parse_args()
File "C:\Program Files\Python\lib\site-packages\mongo_connector-2.5.0.dev0-py3.6.egg\mongo_connector\config.py", line 112, in parse_args
reraise(errors.InvalidConfiguration, *sys.exc_info()[1:])
File "C:\Program Files\Python\lib\site-packages\mongo_connector-2.5.0.dev0-py3.6.egg\mongo_connector\compat.py", line 9, in reraise
raise exctype(str(value)).with_traceback(trace)
File "C:\Program Files\Python\lib\site-packages\mongo_connector-2.5.0.dev0-py3.6.egg\mongo_connector\config.py", line 110, in parse_args
self.load_json(f.read())
File "C:\Program Files\Python\lib\site-packages\mongo_connector-2.5.0.dev0-py3.6.egg\mongo_connector\config.py", line 132, in load_json
parsed_config = json.loads(text)
File "C:\Program Files\Python\lib\json\__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "C:\Program Files\Python\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Program Files\Python\lib\json\decoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
mongo_connector.errors.InvalidConfiguration: Invalid \escape: line 6 column 21 (char 201)

Try this.
pip install 'elastic2-doc-manager[elastic5]'
mongo-connector -m localhost:27017 -t localhost:9200 -d elastic2_doc_manager

answer on github
Your strategy seems sound to me. Here's how to do this:
Generate a mongo-connector timestamp file:
Run mongo-connector --no-dump.
Stop mongo-connector right after it starts up. Now you have an
oplog.timestamp file pointing to the latest entry on the oplog.
Run mongodump on the primary. The dump already reflects all the
changes that mongo-connector saw in the oplog.
Run mongorestore with the dump from (2) on the target MongoDB.
Restart mongo-connector. Pass in the file generated in (1) to the
--oplog-ts option.
I'll add this to the wiki.

Ansible error due to GMP package version on Centos6

I have a Dockerfile that builds an image based on CentOS (tag: centos6):
FROM centos
RUN rpm -iUvh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
RUN yum update -y
RUN yum install ansible -y
ADD ./ansible /home/root/ansible
RUN cd /home/root/ansible;ansible-playbook -v -i hosts site.yml
Everything works fine until Docker hits the last line, then I get the following errors:
[WARNING]: The version of gmp you have installed has a known issue regarding
timing vulnerabilities when used with pycrypto. If possible, you should update
it (ie. yum update gmp).
PLAY [all] ********************************************************************
GATHERING FACTS ***************************************************************
Traceback (most recent call last):
File "/usr/bin/ansible-playbook", line 317, in <module>
sys.exit(main(sys.argv[1:]))
File "/usr/bin/ansible-playbook", line 257, in main
pb.run()
File "/usr/lib/python2.6/site-packages/ansible/playbook/__init__.py", line 319, in run
if not self._run_play(play):
File "/usr/lib/python2.6/site-packages/ansible/playbook/__init__.py", line 620, in _run_play
self._do_setup_step(play)
File "/usr/lib/python2.6/site-packages/ansible/playbook/__init__.py", line 565, in _do_setup_step
accelerate_port=play.accelerate_port,
File "/usr/lib/python2.6/site-packages/ansible/runner/__init__.py", line 204, in __init__
cmd = subprocess.Popen(['ssh','-o','ControlPersist'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
File "/usr/lib64/python2.6/subprocess.py", line 642, in __init__
errread, errwrite)
File "/usr/lib64/python2.6/subprocess.py", line 1234, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Stderr from the command:
package epel-release-6-8.noarch is already installed
I imagine that the cause of the error is the gmp package not being up to date.
There is a related issue on GitHub: https://github.com/ansible/ansible/issues/6941
But there doesn't seem to be any solutions at the moment ...
Any ideas ?
Thanks in advance !
My site.yml playbook:
- hosts: all
pre_tasks:
- shell: echo 'hello'

Make sure that the files site.yml and hosts are present in the directory you're adding to /home/root/ansible.
Side note, you can simplify your Dockerfile by using WORKDIR:
WORKDIR /home/root/ansible
RUN ansible-playbook -v -i hosts site.yml