I am invoking dataflow job using gcloud cli. My command looks like below;
gcloud dataflow jobs run avrojob4 \
--gcs-location=gs://dataflow-templates/latest/Cloud_Bigtable_to_GCS_Avro \
--region=europe-west1 \
--parameters bigtableProjectId="project-id",bigtableInstanceId="instance-id",bigtableTableId="table-id",outputDirectory="gs://avro-data/avrojob4/",filenamePrefix="avrojob4-"
and:
ERROR: Failed to write a file to temp location 'gs://dataflow-staging-us-central1-473832897378/temp/'. Please make sure that the bucket for this directory exists, and that the project under which the workflow is running has the necessary permissions to write to it.
Can someone help me how to pass temp location as specific value through above command?
There is no --temp-location flag for this command:
https://cloud.google.com/sdk/gcloud/reference/dataflow/jobs/run
I suspect you're attempting to solve the issue by creating the flag but, as you've seen this does not work.
Does the bucket exist?
Does the Dataflow service account have suitable permissions to write to it?
Can you gsutil ls gs://dataflow-staging-us-central1-473832897378?
if yes, then it's likely that the Dataflow service does not have permission to write to the bucket. Please review the instructions in the following link for adding the correct permissions for the Dataflow (!) service account:
https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#accessing_cloud_storage_buckets_across_google_cloud_platform_projects
Related
Last week I managed to successfully deploy an AWS Lambda function (verified in the AWS console). This morning, I can no longer update the Lambda function. After deleting the Lambda function and pushing changes again, the Lambda was still not able to be created. Instead I get the following traceback:
Build Failed
Error: PythonPipBuilder:ResolveDependencies - The directory '/github/home/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/github/home/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Invalid requirement: 'Warning: The lock flag'
In the workflow deploy file:
- name: Build Lambda image
run: sam build
I don't know exactly what has changed to now cause this error. I tried the flag --use-container which successfully moves on the next step of deploying the Lambda image, however there I now encounter further error messages. I'd like to understand why before adding the --user-container flag, I didn't encounter this error. Is this --use-container flag necessary when not using the sam cli?
Further info
Building via the sam cli tool works, not when pushed via the Github actions workflow.
I am trying to setup github ci cd with hasura...
I did everything as document said so, but since I am applying change locally on database, on cloud deployment it is saying table already exist while applying migration (which is logically correct).
now I want to avoid, skip or sync migration between cloud and local for that hasura mentioned a command in same doc.
While executing this command I am getting resource not found error
command: hasura migrate apply --skip-execution --version 1631602988318 --endpoint "https://customer-support-dev.hasura.app/v1/graphql" --admin-secret 'mySecretKey'
error: time="2021-09-14T20:44:19+05:30" level=fatal msg="{\n \"path\": \"$\",\n \"error\": \"resource does not exist\",\n \"code\": \"not-found\"\n}"
This was a silly mistake --endpoint must not contain URL path. So its value will be https://customer-support-dev.hasura.app.
Running this command:
gcloud sql instances export myinstance gs://my_bucket_name/filename.csv -d "mydatabase" -t "mytable"
Giving me the following error:
ERROR: (gcloud.sql.instances.import) ERROR_RDBMS
I have manually ran console uploads to the bucket which go fine. I am able to login to the sql instance and run queries. Which makes me think that there are no permission issues. Has anybody ever seen this type of error and knows a way around it?
Note: i have googled for possible situations, and most of them point to either sql or bucket permission issues.
Nvm. I figured out that i need to make an oauth connection (using the json token generated from gcloud api/credentials section) to the instance before interacting with it.
I was able to mount my Google Cloud Storage using the command line below:
gcsfuse -o allow_other -file-mode=660 -dir-mode=770 --uid=<uid> --gid=<gid> testbucket /path/to/domain/folder
The group includes the user apache. Apache is able to write to the mounted drive like so:
sudo -u apache echo 'Some Test Text' > /path/to/domain/folder/hello.txt
hello.txt appears in the bucket as expected. However when I execute the below php script I get an error:
<?php file_put_contents('/path/to/domain/folder/hello.txt', 'Some Test Text');
PHP Error: failed to open stream: Permission denied
echo exec('whoami'); Returns apache
I assumed this is a common use for mounting with gcsfuse or something similar to this but, I seem to be the only one on the internet with this issue. I do not know if its an issue with the way I mounted it or the service security of httpd.
I came across a similar issue.
Use the flag --implicit-dirs while mounting the Google Storage bucket using gcsfuse. More on this here.
Mounting the bucket as a folder makes the OS to treat it like a regular folder which may contain files and folders. But Google Cloud Storage bucket doesn't have directory structures. For example, when you are creating a file named hello.txt in a folder named files inside a Google Storage bucket, you are not actually creating a folder and putting the file in it. The object is created in the bucket with the name as files/hello.txt. More on this here and here.
To make the OS treat the GCS bucket like a hierarchical structure, you have to specify the --implicit-dirs flag to the gcsfuse.
Note:
I wouldn't recommend using gcsfuse in production systems as it is a beta quality software.
I am trying to delete a file in google cloud from oozie. I am creating a dummy script and executing it through oozie. I have a prepare statement where I say "delete gs://....."
It is not working and the error is "schema gs not supported". How could I delete Google cloud storage files in oozie workflow otherwise.
I got the solution.
I have created a shell script with hadoop fs -rm command to delete the files from Google Cloud Storage. The shell script was scheduled from Oozie as well. It solves my issue.