Errors when processing gsutil -m -q setmeta - google-cloud-storage

When processing the command:
gsutil -m -q setmeta -h "Cache-Control:public, max-age=10"
I get these errors frequently:
ERROR 1028 16:10:46.257674 retry_decorator.py] Retrying in 0.94 seconds ...
Traceback (most recent call last):
File "/usr/local/share/google/google-cloud-sdk/platform/gsutil/third_party/retry-decorator/retry_decorator/retry_decorator.py", line 20, in f_retry
return f(*args, **kwargs)
File "/usr/local/share/google/google-cloud-sdk/platform/gsutil/gslib/commands/setmeta.py", line 248, in SetMetadataFunc
provider=exp_src_url.scheme)
File "/usr/local/share/google/google-cloud-sdk/platform/gsutil/gslib/cloud_api_delegator.py", line 212, in PatchObjectMetadata
generation=generation, preconditions=preconditions, fields=fields)
File "/usr/local/share/google/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 819, in PatchObjectMetadata
generation=generation)
File "/usr/local/share/google/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 1308, in _TranslateExceptionAndRaise
raise translated_exception
PreconditionException: PreconditionException: 412 Precondition Failed
The server is on Google Compute engine and is updated frequently with:
gcloud components update
It seems the process actually completes but these errors keep occurring. Any idea what causes them and if there is a solution?
Thanks.

This can occur for two reasons:
Another client updated the object (or its metadata) concurrently.
There was a transient service or network error that needed to be retried (thus the "Retrying" message), but the original request actually succeeded. The retry gets preconditioned against the original object's metageneration, so it fails, even though the original operation succeeded.
If the cause is #1, you can solve it by avoiding concurrent updates to the objects. If the cause is #2, unfortunately, there is not much you can do.

Related

How do I get past "Could not queue the build because there were validation errors or warnings." while automating pipeline creation using az-cli

I am trying to automate rsync pipeline creation using az-cli.
This is the command I am running from a local clone of my repository:
az pipelines create --name my_pipeline --yml-path azure-pipeline.yml --project my_project --repository my_repo --repository-type tfsgit
The pipeline is created but it is not able to queue it. Here are the details from the --debug switch. Am I missing something?
The expected output was to not only create the pipeline but also run it.
**WARNING: This command is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
WARNING: cli.azext_devops.dev.pipelines.pipeline_create: Successfully created a pipeline with Name: my_pipeline, Id: 2019.**
DEBUG: msrest.exceptions: Could not queue the build because there were validation errors or warnings.
DEBUG: cli.azext_devops.dev.common.exception_handler: handling vsts service error
DEBUG: cli.azure.cli.core.util: azure.cli.core.util.handle_exception is called with an exception:
DEBUG: cli.azure.cli.core.util: Traceback (most recent call last):
File "/usr/lib64/az/lib/python3.6/site-packages/azure/cli/core/commands/init.py", line 691, in _run_job
result = cmd_copy(params)
File "/usr/lib64/az/lib/python3.6/site-packages/azure/cli/core/commands/init.py", line 328, in call
*return self.handler(*args, *kwargs)
File "/usr/lib64/az/lib/python3.6/site-packages/azure/cli/core/commands/command_operation.py", line 121, in handler
*return op(*command_args)
File "/home/user/.azure/cliextensions/azure-devops/azext_devops/dev/pipelines/pipeline_create.py", line 155, in pipeline_create
project=project)
File "/home/user/.azure/cliextensions/azure-devops/azext_devops/devops_sdk/v5_1/build/build_client.py", line 337, in queue_build
content=content)
File "/home/user/.azure/cliextensions/azure-devops/azext_devops/devops_sdk/client.py", line 90, in _send
response = self._send_request(request=request, headers=headers, content=content, media_type=media_type)
File "/home/user/.azure/cliextensions/azure-devops/azext_devops/devops_sdk/client.py", line 54, in _send_request
self._handle_error(request, response)
File "/home/user/.azure/cliextensions/azure-devops/azext_devops/devops_sdk/client.py", line 233, in _handle_error
raise AzureDevOpsServiceError(wrapped_exception)
azext_devops.devops_sdk.exceptions.AzureDevOpsServiceError: Could not queue the build because there were validation errors or warnings.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib64/az/lib/python3.6/site-packages/knack/cli.py", line 231, in invoke
cmd_result = self.invocation.execute(args)
File "/usr/lib64/az/lib/python3.6/site-packages/azure/cli/core/commands/init.py", line 657, in execute
raise ex
File "/usr/lib64/az/lib/python3.6/site-packages/azure/cli/core/commands/init.py", line 720, in _run_jobs_serially
results.append(self._run_job(expanded_arg, cmd_copy))
File "/usr/lib64/az/lib/python3.6/site-packages/azure/cli/core/commands/init.py", line 712, in _run_job
return cmd_copy.exception_handler(ex)
File "/home/user/.azure/cliextensions/azure-devops/azext_devops/dev/common/exception_handler.py", line 18, in azure_devops_exception_handler
raise CLIError(ex)
knack.util.CLIError: Could not queue the build because there were validation errors or warnings.
ERROR: cli.azure.cli.core.azclierror: Could not queue the build because there were validation errors or warnings.
ERROR: az_command_data_logger: Could not queue the build because there were validation errors or warnings.
DEBUG: cli.knack.cli: Event: Cli.PostExecute [<function AzCliLogging.deinit_cmd_metadata_logging at 0x7fe2e4a682f0>]
INFO: az_command_data_logger: exit code: 1
INFO: cli.main: Command ran in 2.552 seconds (init: 0.200, invoke: 2.352)
INFO: telemetry.save: Save telemetry record of length 3257 in cache
WARNING: telemetry.check: Negative: The /home/user/.azure/telemetry.txt was modified at 2022-04-07 14:29:35.737231, which in less than 600.000000 s
Additional information: I am setting the AZURE_DEVOPS_EXT_PAT env variable to authenticate and use az-cli commands.
The error message says it all, it can't queue the build because there are errors in the YAML.
It created pipeline 2019, you need to review the YAML and correct the validation errors before it'll run:
Open a browser and navigate to https://dev.azure.com/<your-organization-name>/<your-project-name>/_build?definitionId=2019
Click on the Edit button
In the elipsis context menu, select validate:
The error message about the invalid syntax will be shown in a dialog box.
Alternatively, the Azure DevOps REST API exposes an endpoint to do the same:
preview pipeline
or pipeline run with the previewRun parameter specified in the request body

snakemake fails due to jobscript not found

I'm running snakemake on fairly large workflows. Somewhat randomly I get errors like
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/prog/Python/3.7.9-foss-2018a/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/prog/Python/3.7.9-foss-2018a/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "<home>/.pip/CentOS/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 1069, in _wait_for_jobs
status = job_status(active_job)
File "<home>/.pip/CentOS/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 1051, in job_status
os.remove(active_job.jobscript)<current working dir>/.snakemake/tmp.0w7jh5bc/snakejob.<name of rule>.6868.sh'
It happens somewhat randomly and restarting the workflow usually resolves the problem. I think this might be due to filesystem latency, however the latency-wait flag seems to work only on output files. Is there a way to make snakemake wait for jobscripts as well?

Snakemake and cloud formation cluster error with local scratch space

I am having a problem using local scratch space on cfncluster and snakemake at the same time. My strategy is to write data to local scratch for each node in the cluster and then move the data to the NFS partition.
Unfortunately I am getting the following error:
snakemake 4.0.0, cfncluster
/shared/bin/bin/snakemake --rerun-incomplete -s /shared/scripts/sra_to_fa_cluster.py -j 1 -p --latency-wait 20 -k -c " qsub -cwd -V" -F
/shared/dbGAP/sra_toolkit/sratoolkit.2.8.2-1-ubuntu64/bin/fastq-dump --split-files --gzip --outdir /scratch/ /shared/dbGAP/sras2/test/SRR2135300.sra
Waiting at most 20 seconds for missing files.
Exception in thread Thread-1:
Traceback (most recent call last):
File "/shared/bin/lib/python3.6/site-packages/snakemake/dag.py", line 319, in check_and_touch_output
wait_for_files(expanded_output, latency_wait=wait)
File "/shared/bin/lib/python3.6/site-packages/snakemake/io.py", line 395, in wait_for_files
latency_wait, "\n".join(get_missing())))
OSError: Missing files after 20 seconds:
/scratch/SRR2135300_2.fastq.gz
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/shared/bin/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/shared/bin/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/shared/bin/lib/python3.6/site-packages/snakemake/executors.py", line 647, in _wait_for_jobs
active_job.callback(active_job.job)
File "/shared/bin/lib/python3.6/site-packages/snakemake/scheduler.py", line 287, in _proceed
self.get_executor(job).handle_job_success(job)
File "/shared/bin/lib/python3.6/site-packages/snakemake/executors.py", line 549, in handle_job_success
super().handle_job_success(job, upload_remote=False)
File "/shared/bin/lib/python3.6/site-packages/snakemake/executors.py", line 178, in handle_job_success
ignore_missing_output=ignore_missing_output)
File "/shared/bin/lib/python3.6/site-packages/snakemake/dag.py", line 323, in check_and_touch_output
"wait time with --latency-wait.", rule=job.rule)
snakemake.exceptions.MissingOutputException: Missing files after 20 seconds:
/scratch/SRR2135300_2.fastq.gz
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
This is similar to the error reported here:
https://bitbucket.org/snakemake/snakemake/issues/462/unhandled-missingoutputexception-in
Snakemake script is as follows:
rule all:
input:expand("/shared/dbGAP/sras2/fastq.gz/{sample}_{end}.fastq.gz",
sample=SAMPLES, end=END)
rule move:
input: left="/scratch/{sample}_1.fastq.gz", right="/scratch/{sample}_2.fastq.gz"
output: left="/shared/dbGAP/sras2/fastq.gz/{sample}_1.fastq.gz", right="/shared/dbGAP/sras2/fastq.gz/{sample}_2.fastq.gz"
shell: "rsync --remove-source-files -av {input.left} {output.left}; rsync --remove-source-files -av {input.right} {output.right};"
rule get_fastq_files_from_sra_file:
input: sras="/shared/dbGAP/sras2/test/{sample}.sra"
output: left="/scratch/{sample}_1.fastq.gz", right="/scratch/{sample}_2.fastq.gz"
shell: "/shared/dbGAP/sra_toolkit/sratoolkit.2.8.2-1-ubuntu64/bin/fastq-dump --split-files --gzip --outdir /scratch/ {input}"
My feeling is that snakemake cannot "see" the scratch on the nodes, so it returns it as missing, but I am not sure how to solve this issue.

redis-py raises AttributeError

In what circumstances would redis-py raise the following AttributeError exception?
Isn't redis-py built by design to raise only redis.exceptions.RedisError based exceptions?
What would be a reasonable handling logic?
Traceback (most recent call last):
File "c:\Python27\Lib\threading.py", line 551, in __bootstrap_inner
self.run()
File "c:\Python27\Lib\threading.py", line 504, in run
self.__target(*self.__args, **self.__kwargs)
File C:\Users\Administrator\Documents\my_proj\my_module.py", line 33, in inner
ret = protected_func(*args, **kwargs)
File C:\Users\Administrator\Documents\my_proj\my_module.py", line 104, in _listen
for message in _pubsub.listen():
File "C:\Users\Administrator\virtual_environments\my_env\lib\site-packages\redis\client.py", line 1555, in listen
r = self.parse_response()
File "C:\Users\Administrator\virtual_environments\my_env\lib\site-packages\redis\client.py", line 1499, in parse_response
response = self.connection.read_response()
File "C:\Users\Administrator\virtual_environments\my_env\lib\site-packages\redis\connection.py", line 306, in read_response
response = self._parser.read_response()
File "C:\Users\Administrator\virtual_environments\my_env\lib\site-packages\redis\connection.py", line 104, in read_response
response = self.read()
File "C:\Users\Administrator\virtual_environments\my_env\lib\site-packages\redis\connection.py", line 89, in read
return self._fp.readline()[:-2]
AttributeError: 'NoneType' object has no attribute 'readline'
seems like an old question, but I faced the same problem recently.
My setup was using celery with redis as a broker. A ThreadPoolExecutor uses the shared celery object to batch tasks to workers. The batcher function waits for the submitted tasks to finish using celery.result.ResultSet.
After quick investigations, I found that celery somewhere uses a pub/sub mechanism to wait for the tasks to finish. And that is it, pub/sub don't play well with thread-safety per the official readme https://github.com/andymccurdy/redis-py#thread-safety
Honestly, I didn't try to prove my theory and fixed my problem by switching to a ProcessPoolExecutor instead.

ipython 0.13 zmq errors

I encounter weird behavior of an ipython cluster. The calculations finish, but many results never reach the client (and the engines just idle after finishing their first calculation).
I suspect something is wrong with zmq because 1) from time to time I see the following error:
File "/data/misc/nano/python/env_stable/lib/python2.7/site-packages/IPython/parallel/client/asyncresult.py", line 118, in get
if not self.ready():
File "/data/misc/nano/python/env_stable/lib/python2.7/site-packages/IPython/parallel/client/asyncresult.py", line 132, in ready
self.wait(0)
File "/data/misc/nano/python/env_stable/lib/python2.7/site-packages/IPython/parallel/client/asyncresult.py", line 142, in wait
self._ready = self._client.wait(self.msg_ids, timeout)
File "/data/misc/nano/python/env_stable/lib/python2.7/site-packages/IPython/parallel/client/client.py", line 1058, in wait
self.spin()
File "/data/misc/nano/python/env_stable/lib/python2.7/site-packages/IPython/parallel/client/client.py", line 1015, in spin
self._flush_results(self._task_socket)
File "/data/misc/nano/python/env_stable/lib/python2.7/site-packages/IPython/parallel/client/client.py", line 814, in _flush_results
idents,msg = self.session.recv(sock, mode=zmq.NOBLOCK)
File "/data/misc/nano/python/env_stable/lib/python2.7/site-packages/IPython/zmq/session.py", line 642, in recv
idents, msg_list = self.feed_identities(msg_list, copy)
File "/data/misc/nano/python/env_stable/lib/python2.7/site-packages/IPython/zmq/session.py", line 673, in feed_identities
idx = msg_list.index(DELIM)
ValueError: '<IDS|MSG>' is not in list
Additionally IPython.zmq has two test failures:
======================================================================
ERROR: test_send (IPython.zmq.tests.test_session.TestSession)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/clusterdata/python/env_stable/lib/python2.7/site-packages/IPython/zmq/tests/test_session.py", line 76, in test_send
socket = MockSocket(zmq.Context.instance(),zmq.PAIR)
File "/clusterdata/python/env_stable/lib/python2.7/site-packages/IPython/zmq/tests/test_session.py", line 34, in __init__
self.data = []
File "/clusterdata/python/env_stable/lib/python2.7/site-packages/zmq/sugar/attrsettr.py", line 38, in __setattr__
self.__class__.__name__, upper_key)
AttributeError: MockSocket has no such option: DATA
======================================================================
ERROR: test_send (IPython.zmq.tests.test_session.TestSession)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/clusterdata/python/env_stable/lib/python2.7/site-packages/zmq/tests/__init__.py", line 108, in tearDown
raise RuntimeError("context could not terminate, open sockets likely remain in test")
RuntimeError: context could not terminate, open sockets likely remain in test
----------------------------------------------------------------------
I use pyzmq 13.0.0 (as installed by pip), and the zeromq 3.2.2, compiled by the setup of pyzmq. I use ipython 13.1 and python 2.7.3.
Any suggestions of what could this be, and if not how I could figure out more information why these errors occur?
Update: It turns out the slowdown was due to a long task queue of ipcontroller, which was then taking 100% CPU and lagging horribly. That is a separate issue, but I would still appreciate feedback on the above.
Answered by #minrk in comments. ZMQ errors were unimportant, performance was due to scheduling, and was solved by setting TaskScheduler.hwm=0.