I am trying do to what seems to be a simple task but I can't make it work for now. I want to use cloud composer to gather data from a SQL database and save it in GCS. I am having permissions issues
Here is my DAG:
from airflow.contrib.operators.gcp_sql_operator import CloudSqlInstanceExportOperator
from airflow import models
import datetime
export_body = {
"exportContext": {
"kind": "sql#exportContext",
"fileType": "csv",
"uri": "gs://mybucket/export_sql.csv",
"csvExportOptions": {
"selectQuery": "select count(*) as number from some_table"
}
}
}
yesterday = datetime.datetime.combine(
datetime.datetime.today(),
datetime.datetime.min.time())
start_date = yesterday
JOB_NAME = "job_name"
default_args = {
'start_date': start_date,
}
with models.DAG(JOB_NAME,
schedule_interval="#hourly",
default_args=default_args) as dag:
sql_export_task = CloudSqlInstanceExportOperator(body=export_body,
project_id="project_id",
instance='instance',
task_id='sql_export_task')
sql_export_task
I created a specific service account that have some roles:
Cloud SQL Admin
Composer Worker
Storage Object Creator
When I create the environment i specify this account and then I upload the above DAG in the appropriate bucket.
I get this error:
"error":
"code": 403
"message": "The service account does not have the required permissions for the bucket."
"errors":
"message": "The service account does not have the required permissions for the bucket."
"domain": "global"
"reason": "notAuthorized
Traceback (most recent call last)
File "/usr/local/lib/airflow/airflow/models/__init__.py", line 1491, in _run_raw_tas
result = task_copy.execute(context=context
File "/usr/local/lib/airflow/airflow/contrib/operators/gcp_sql_operator.py", line 643, in execut
body=self.body
File "/usr/local/lib/airflow/airflow/contrib/hooks/gcp_api_base_hook.py", line 247, in inner_wrappe
return func(self, *args, **kwargs
File "/usr/local/lib/airflow/airflow/contrib/hooks/gcp_sql_hook.py", line 310, in export_instanc
'Exporting instance {} failed: {}'.format(instance, ex.content
AirflowException: Exporting instance prod failed:
"error":
"code": 403
"message": "The service account does not have the required permissions for the bucket."
"errors":
"message": "The service account does not have the required permissions for the bucket."
"domain": "global"
"reason": "notAuthorized
I thought the Storage Object Creator role should give me permission.
Should I add an other role to the service account? which one?
Any advice or solution on how to proceed would be most appreciated. Thanks!
EDIT: I added the Storage Admin role and this removed this error.
However it seems despite that my DAG is not working.
The airflow interface send mixed signals: The task has no status:
But it is somehow a success ?
I checked my bucket the csv file I hoped would be created is missing.
Any advice or solution on how to proceed would be most appreciated. Thanks!
I was looking into your permission denied issue, glad that you sorted it out.
I was curious as to why the CSV is missing and I'm thinking that this might have to do with it.
If fileType is CSV, you can specify one database, either by using this property or by using the csvExportOptions.selectQuery property, which takes precedence over this property.
When using a CSV you have to specify the DB either by using exportContext.databases[] or by specifying it in the query you're doing.
Let me know.
Related
As part of a durable function app deployment, I am deploying azure storage.
On deploying the fileServices/shares, I am getting the following error:
error": {
"code": "InvalidHeaderValue",
"message": "The value for one of the HTTP headers is not in the correct format.\nRequestId:6c0b3fb0-701a-0058-0509-a8af5d000000\nTime:2022-08-04T13:49:24.6378224Z"
}
I would appreciate any advice as this is eating up a lot of time and I am no closer to resolving it.
Section of arm template for the share deployment is below:
{
"type": "Microsoft.Storage/storageAccounts/fileServices/shares",
"apiVersion": "2021-09-01",
"name": "[concat(parameters('storageAccount1_name'), '/default/FuncAppName')]",
"dependsOn": [
"[resourceId('Microsoft.Storage/storageAccounts/fileServices', parameters('storageAccount1_name'), 'default')]",
"[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccount1_name'))]"
],
"properties": {
"accessTier": "TransactionOptimized",
"shareQuota": 5120,
"enabledProtocols": "SMB"
}
}
Answer to this: removing the property "accessTier": "TransactionOptimized" resolves the issue. The default value for this is TransactionOptimized.
Although the template exported from azure portal includes this property, deployment fails if this parameter is present.
I have a scenario in which I need to export multiple CSV's from the Azure Data Explorer for that I am using the .export Command. When I try to run this request multiple times I get the following error
*TooManyRequests (429-TooManyRequests): {
"error": {
"code": "Too many requests",
"message": "Request is denied due to throttling.",
"#type": "Kusto.DataNode.Exceptions.ControlCommandThrottledException",
"#message": "The control command was aborted due to throttling. Retrying after some backoff might succeed. CommandType: 'DataExportToFile'*
Is there a way I can handle this without increasing the Instance count.
you can alter the capacity policy: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/capacitypolicy
I have 2 requirements:
1:I have a clusterID. I need to start the cluster from a "Wb Activity" in ADF. The activity parameters look like this:
url:https://XXXX..azuredatabricks.net/api/2.0/clusters/start
body: {"cluster_id":"0311-004310-cars577"}
Authentication: Azure Key Vault Client Certificate
Upon running this activity I am encountering with below error:
"errorCode": "2108",
"message": "Error calling the endpoint
'https://xxxxx.azuredatabricks.net/api/2.0/clusters/start'. Response status code: ''. More
details:Exception message: 'Cannot find the requested object.\r\n'.\r\nNo response from the
endpoint. Possible causes: network connectivity, DNS failure, server certificate validation or
timeout.",
"failureType": "UserError",
"target": "GetADBToken",
"GetADBToken" is my activity name.
The above security mechanism is working for other Databricks related activity such a running jar which is already installed on my databricks cluster.
2: I want to create a new cluster with the below settings:
url:https://XXXX..azuredatabricks.net/api/2.0/clusters/create
body:{
"cluster_name": "my-cluster",
"spark_version": "5.3.x-scala2.11",
"node_type_id": "i3.xlarge",
"spark_conf": {
"spark.speculation": true
},
"num_workers": 2
}
Upon calling this api, if a cluster creation is successful I would like to capture the cluster id in the next activity.
So what would be the output of the above activity and how can I access them in an immediate ADF activity?
For #2 ) Can you please check if you change the version
"spark_version": "5.3.x-scala2.11"
to
"spark_version": "6.4.x-scala2.11"
if that helps
I'm having a go at creating my first data fusion pipeline.
The data is going from Google Cloud Storage csv file to Big Query.
I have created the pipeline and carried out a preview run which was successful but after deployment trying to run resulted in error.
I pretty much accepted all the default settings apart from obviously configuring my source and destination.
Error from Log ...
com.google.api.client.googleapis.json.GoogleJsonResponseException: 403
Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "Required 'compute.firewalls.list' permission for
'projects/xxxxxxxxxxx'",
"reason" : "forbidden"
} ],
"message" : "Required 'compute.firewalls.list' permission for
'projects/xxxxxxxxxx'"
}
After deployment run fails
Do note that as a part of creating an instance, you must set up permissions [0]. The role "Cloud Data Fusion API Service Agent" must be granted to the exact service account, as specified in that document, which has an email address that begins with "cloud-datafusion-management-sa#...".
Doing so should resolve your issue.
[0] : https://cloud.google.com/data-fusion/docs/how-to/create-instance#setting_up_permissions
When trying to restore a backup to a new cloud sql instance I get the following message when using curl:
{
"error": {
"errors": [
{
"domain": "global",
"reason": "invalidOperation",
"message": "This operation isn\"t valid for this instance."
}
],
"code": 400,
"message": "This operation isn\"t valid for this instance."
}
}
When trying via google cloud console, after clicking 'ok' in the 'restore instance from backup' menu nothing happens.
I'll answer even thought this is a very old question, maybe useful for someone else (would have been for me).
I just had the same exact same error, my problem was that the storage capacity for the target instance was different than the one for the source instance. My source instance was accidentally deleted so this was a bit troublesome to figure out. This check list helped me https://cloud.google.com/sql/docs/postgres/backup-recovery/restore#tips-restore-different-instance