I am trying to delete a file in google cloud from oozie. I am creating a dummy script and executing it through oozie. I have a prepare statement where I say "delete gs://....."
It is not working and the error is "schema gs not supported". How could I delete Google cloud storage files in oozie workflow otherwise.
I got the solution.
I have created a shell script with hadoop fs -rm command to delete the files from Google Cloud Storage. The shell script was scheduled from Oozie as well. It solves my issue.
Related
I am trying to connect my powershell runbook to a storage account (blob) to read .sql files and execute them in my Azure SQL Database.
Connect to a blob container
Read the script on .sql file
Execute the script on the db
When I try Invoke-Sqlcmd, it requires a dedicated storage to store the file. However, runbooks work serverless and there is no storage I can use for the files, as far as I know.
Is there a way to only read the files (without moving them around) via powershell runbook or can I store the files to read with it?
I am new in Google Cloud. I created a Cloud SQL Instance and I need to restore the data from a .bak file. I have the .bak file in a GCS bucket, and I am trying to restore using Microsoft Management Studio -> Task -> Restore. But I'm not able to access the file.
Can anyone help me with the procedure on how to restore from a .bak file?
You need to give the Cloud SQL service Account access to the bucket where the file is saved.
On Cloud Shell run the following:
gcloud sql instances describe [INSTANCE_NAME]
On the output search for the field "serviceAccountEmailAddress" an copy the SA email.
Then again on cloud shell run the following:
gsutil iam ch serviceAccount:<<SERVICE_ACCOUNT_EMAIL>:legacyBucketWriter gs://<<BUCKET_NAME>>
gsutil iam ch serviceAccount:<<SERVICE_ACCOUNT_EMAIL>:objectViewer gs://<<BUCKET_NAME>>
That should give the service account permission to access the bucket and retrieve the file, also here is the guide on doing the import, take in mind that doing the import will override all the data on the DB.
Also remember that:
You cannot import a database that was exported from a higher version of SQL Server. For example, if you exported a SQL Server 2017 Enterprise version, you cannot import it into a SQL Server 2017 Standard version.
I am invoking dataflow job using gcloud cli. My command looks like below;
gcloud dataflow jobs run avrojob4 \
--gcs-location=gs://dataflow-templates/latest/Cloud_Bigtable_to_GCS_Avro \
--region=europe-west1 \
--parameters bigtableProjectId="project-id",bigtableInstanceId="instance-id",bigtableTableId="table-id",outputDirectory="gs://avro-data/avrojob4/",filenamePrefix="avrojob4-"
and:
ERROR: Failed to write a file to temp location 'gs://dataflow-staging-us-central1-473832897378/temp/'. Please make sure that the bucket for this directory exists, and that the project under which the workflow is running has the necessary permissions to write to it.
Can someone help me how to pass temp location as specific value through above command?
There is no --temp-location flag for this command:
https://cloud.google.com/sdk/gcloud/reference/dataflow/jobs/run
I suspect you're attempting to solve the issue by creating the flag but, as you've seen this does not work.
Does the bucket exist?
Does the Dataflow service account have suitable permissions to write to it?
Can you gsutil ls gs://dataflow-staging-us-central1-473832897378?
if yes, then it's likely that the Dataflow service does not have permission to write to the bucket. Please review the instructions in the following link for adding the correct permissions for the Dataflow (!) service account:
https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#accessing_cloud_storage_buckets_across_google_cloud_platform_projects
I'm working with an Azure Postgresql database and am using the Cloud Shell to run psql scripts without problems. I'm now trying to load some shp files via the shp2pgsql command. The cloud shell responds by:
bash: shp2pgsql: command not found
Is it possible at all to use shp2pgsql with the Cloud Shell or I'm missing something? I've already successfully created the postgis extension on the Postgresql server.
Unfortunately, it seems that you cannot run the shp2pgsql command in the Azure Cloud Shell. It is just an interactive, browser-accessible shell for managing Azure resources. Not integrated with too much tool in it because of its flexibility. You can get more details about the features from the Features & tools for Azure Cloud Shell.
I suggest if you want to do something complicated, you'd better run it in a specific Azure VM for yourself. Hope this will be helpful to you.
I have my log files on EC2 instance and want to load it to Redshift. Two questions:
Do I have to copy this log file to S3 before proceeding or can I directly copy from my EBS Volume.
I can see I can use copy command from SQL Workbench or Data Pipeline. But can I use it from my EC2 instance itself ? Which AWS CLI I need to install?
http://docs.aws.amazon.com/cli/latest/reference/redshift/ does
not list copy command
Not really. Redshift allows you to copy from a remote host, which, in your case, would be your EC2 instance. Documentation here.
The link you've referred to provides cluster management commands. To run SQL queries on your cluster, you can use the psql tool. Documentation here.
you can copy the data directly from EC2, but my recommendation is to save it first on S3 , also for a backup
All the documentation available online was confusing me. Finally the solution was that I wrote a simple Java file with DriverManager.getConnection() and calling copy command via stmt.executeUpdate() and it worked seamlessly. Only executeUpdate() did not return me number of records Inserted.