CannotPullContainerError: AWS Batch Job - amazon-ecs

I am trying to run a job in AWS Batch. This is my first attempt.
I have a python script which reads files from a S3 bucket, processes them and makes tables in RDS Postgres.
I have made a docker image with my script, pandas, boto3, SQLAlchemy and pushed it to hub.docker.com
When I try to run a job in AWS Batch it get the below error -
CannotPullContainerError: Error response from daemon: pull access denied for *dockerimagename*, repository does not exist or may require 'docker login'
What is a possible solution? I am stuck with this for a long time.

I had this issue when I was only putting the image name in the Container Image field of the Job Description. So I was putting:
*dockerimagename*
when I should have been putting:
0123456789.dkr.ecr.us-east-1.amazonaws.com/*dockerimagename*
You can get the first part of that by going to your ECR > Repositories in the AWS console and copying the link from there (there's even a button to do it).

Related

CreateContainerError with microk8s, ghrc.io image

The error message is CreateContainerError
Error: failed to create containerd container: error unpacking image: failed to extract layer sha256:b9b5285004b8a3: failed to get stream processor for application/vnd.in-toto+json: no processor for media-type: unknown
Image pull was successful with the token I supplied (jmtoken)
I am testing on AWS EC2 t2.medium, the docker image is tested on local machine.
Anybody experience this issue ? How did you solve it ?
deployment yaml file
I found a bug in my yaml file.
I supply command and CMD in K8S and Dockerfile each. So the CMD in Dockerfile which is actual command doesn't run, and cause side effects including this issue.
Another tip. Adding sleep 3000 command in K8S sometimes solve other issues like crash.

GCloud authentication race conditions

I'm trying to avoid race conditions with gcloud / gsutil authentication on the same system but different CI/CD jobs on my Gitlab-Runner on a Mac Mini.
I have tried setting the auth manually with
RUN gcloud auth activate-service-account --key-file="gitlab-runner.json"
RUN gcloud config set project $GCP_PROJECT_ID
for the Dockerfile (in which I'm performing a download operation from a Google Cloud Storage bucket).
I'm using a configuration in the bash script to run the docker command and in the same script for authenticating I'm using
gcloud config configurations activate $TARGET
Where I've previously done the above two commands to save them to the configuration.
The configurations are working fine if I start the CI/CD jobs one after the other has finished. But I want to trigger them for all clients at the same time, which causes race conditions with gcloud authentication and one of the jobs trying to download from the wrong project bucket.
How to avoid a race condition? I'm already authenticating before each gsutil command but still its causing the race condition. Do I need something like CloudBuild to separate the runtime environments?
You can use Cloud Build to get separate execution environments but this might be an overkill for your use case, as a Cloud Build worker uses an entire VM which might be just too heavy, linux containers / Docker can provide necessary isolation as well.
You should make sure that each container you run has a unique config file placed in the path expected by gcloud. The issue may come from improper volume mounting (all the containers share the same location from the host/OS), or maybe you should mount a directory containing their configuration file (unique for each bucket) on running an image, or perhaps you should run gcloud config configurations activate in a Dockerfile step (thus creating image variants for different buckets if it’s feasible).
Alternatively, and I think this solution might be easier, you can switch from Cloud SDK distribution to standalone gsutil distribution. That way you can provide a path to a boto configuration file through an environment variable.
Such variables can be specified on running a Docker image.

Is there a way to deploy scheduled queries to GCP directly through a github action, with a configurable schedule?

Currently using GCP BigQuery UI for scheduled queries, everything is manually done.
Wondering if there's a way to automatically deploy to GCP using a config JSON that contains the scheduled query's parameters and scheduled times through github actions?
So far, this is one option I've found that makes it more "automated":
- store query in a file on Cloud Storage. When invoking Cloud Function, you read the file content and you perform a bigQuery job on it.
- have to update the file content to update the query
- con: read file from storage, then call BQ: 2 api calls and query file to manage
Currently using DBT in other repos to automate and make this process quicker: https://docs.getdbt.com/docs/introduction
Would prefer the github actions version though, just haven't found a good docu yet :)

How to copy yarn ssh logs automatically using scala to blob storage

We have a requirement to download the yarn ssh logs to blob storage automatically. I found that the yarn logs does get added to storage account under /app-logs/user/logs/ etc path but they are in a binary format and there is no documented way to convert these into text format. So we are trying to run the external command yarn logs -application <application_id> using scala at the end of our application run to capture the logs and save them to the blob storage but facing issues with that. Looking for a solution to get these logs automatically downloaded to storage account as part of the spark pipeline itself.
I tried redirecting the output of the yarn logs command to a temp file and then copying the file from local to blob storage. These commands work fine when I ssh into the head node of the spark cluster and run them. But they are not working when executed from jupyter notebook or scala application.
("yarn logs -applicationId application_1561088998595_xxx > /tmp/yarnlog_2.txt") !!
("hadoop dfs -fs wasbs://dev52mss#sahdimssperfdev.blob.core.windows.net -copyFromLocal /tmp/yarnlog_2.txt /tmp/") !!
When I run these commands using jupyter notebook, the first command works fine to redirect to a local file but the second one to move the file to blob fails with the following error:
warning: there was one feature warning; re-run with -feature for details
java.lang.RuntimeException: Nonzero exit value: 1
at scala.sys.package$.error(package.scala:27)
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.slurp(ProcessBuilderImpl.scala:132)
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.$bang$bang(ProcessBuilderImpl.scala:102)
... 56 elided
Initially I tried capturing the output of the command as a Dataframe and writing the dataframe to blob. It succeeded for small logs but for huge logs it failed with the error:
Serialized task 15:0 was 137500581 bytes, which exceeds max allowed: spark.rpc.message.maxSize (134217728 bytes). Consider increasing spark.rpc.message.maxSize or using broadcast variables for large values
val yarnLog = Seq(Process("yarn logs -applicationId " + "application_1560960859861_0003").!!).toDF()
yarnLog.write.mode("overwrite").text("wasbs://container#storageAccount.blob.core.windows.net/Dev/Logs/application_1560960859861_0003.txt")
Note: You can directly access the log files using Azure Storage => Blobs => Select Container => app logs
Azure HDInsight stores its log files both in the cluster file system and in Azure storage. You can examine log files in the cluster by opening an SSH connection to the cluster and browsing the file system, or by using the Hadoop YARN Status portal on the remote head node server. You can examine the log files in Azure storage using any of the tools that can access and download data from Azure storage.
Examples are AzCopy, CloudXplorer, and the Visual Studio Server Explorer. You can also use PowerShell and the Azure Storage Client libraries, or the Azure .NET SDKs, to access data in Azure blob storage.
For more details, refer "Manage logs for Azure HDInsight cluster".
Hope this helps.
Currently, you will need to use the 'yarn logs' command to view Yarn logs.
As regards your requirement, there are two methods to achieve this;
Method 1:
Schedule a daily copy of the app-logs folder into a desired container within the blob storage. This will do a differential copy every day at a specific time of the day. For this one, I had to use Azure Data Factory to achieve the scheduling. Quite easy and no manual copy or coding required.
However, because the yarn applications logs are stored in TFile binary format and can only be read using ‘yarn logs’ command, it means that you need to have another tool application to read the file when from the destination later on. You can use the tool here to read the files https://github.com/shanyu/hadooplogparser
Alternatively, you can have your own simple script that converts it to a readable file before the transfer. Sample script below
**
yarn logs -applicationId application_15645293xxxxx > /tmp/source/applog_back.txt
hadoop dfs -fs wasbs://hdiblob #sandboxblob.blob.core.windows.net -copyFromLocal /tmp/source/applog_back.txt /tmp/destination
**
Method 2:
This is the simplest and cheapest method. You can disable the retention period of the Yarn Application logs, this means the logs will be retained indefinitely. To do this, change the config “yarn.log-aggregation.retain-seconds” to value -1. This config can be found in yarn-site.xml.
Once this is done, you can always read your Yarn Applications logs anytime from the cluster using the Yarn UI or CLI.
Hope this helps

Deployed jobs stopped working with an image error?

In the last few hours I am no longer able to execute deployed Data Fusion pipeline jobs - they just end in an error state almost instantly.
I can run the jobs in Preview mode, but when trying to run deployed jobs this error appears in the logs:
com.google.api.gax.rpc.InvalidArgumentException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Selected software image version '1.2.65-deb9' can no longer be used to create new clusters. Please select a more recent image
I've tried with both an existing instance and a new instance, and all deployed jobs including the sample jobs give this error.
Any ideas? I cannot find any config options for what image is used for execution
We are currently investigating an issue with the image for Cloud Dataproc used by Cloud Data Fusion. We had pinned a version of Dataproc VM image for the launch that is causing an issue.
We apologize for you inconvenience. We are working to resolve the issue as soon as possible for you.
Will provide update on this thread.
Nitin