I created a gcs bucket on https://console.cloud.google.com/storage/
and successfully mounted on an instance with gcsfuse,
however, when I try to write to the mounted directory, it shows an Input/output error.
fuse_debug: Op 0x00000003 connection.go:395] <- LookUpInode (parent 1, name "test")
gcs: Req 0x1: <- StatObject("test/")
gcs: Req 0x2: <- StatObject("test")
gcs: Req 0x1: -> StatObject("test/") (31.355698ms): gcs.NotFoundError: googleapi: Error 404: Not Found, notFound
gcs: Req 0x2: -> StatObject("test") (51.589538ms): gcs.NotFoundError: googleapi: Error 404: Not Found, notFound
fuse_debug: Op 0x00000003 connection.go:476] -> Error: "no such file or directory"
fuse_debug: Op 0x00000004 connection.go:395] <- MkDir (parent 1, name "test")
gcs: Req 0x3: <- CreateObject("test/")
gcs: Req 0x3: -> CreateObject("test/") (13.513239ms): googleapi: Error 403: Insufficient Permission, insufficientPermissions
fuse_debug: Op 0x00000004 connection.go:476] -> Error: "CreateChildDir: googleapi: Error 403: Insufficient Permission, insufficientPermissions"
fuse: 2016/06/09 02:12:40.128885 *fuseops.MkDirOp error: CreateChildDir: googleapi: Error 403: Insufficient Permission, insufficientPermissions
It appears from the --foreground --debug_fuse output that you're using credentials that aren't allowed to write to the bucket. They are probably read-only (StatObject didn't return a 403, and gcsfuse checks at startup that it can list the bucket).
Try giving the docs about credentials a careful read. In particular, if you're getting credentials automatically on a Google Compute Engine VM, you probably forgot to create it with the storage-full scope.
Please run gcsfuse with --foreground (and perhaps --debug_fuse) to get some indication of what the error is when it happens.
Related
I am trying to configure the storage in loki logs , i have configured gcs bucket.
But in when I try to see loki logs, I am getting a 403 error as follows:
""2822-18-21 18:43:52 level-error ts-2822-18-21T05:13:52.8647427222
caller=flush.go:146 org_id=fake msg="failed to flush user err="store put
chunk: googleapi: Error 483: Access denied., forbidden""
What might be the reason?
I try to execute the following command line:
mssparkutils.fs.ls("abfss://mycontainer#myadfs.dfs.core.windows.net/myfolder/")
I get the error:
Py4JJavaError: An error occurred while calling z:mssparkutils.fs.ls.
: java.nio.file.AccessDeniedException: Operation failed: "This request is not authorized to perform this operation.", 403, GET, https://myadfs.dfs.core.windows.net/mycontainer?upn=false&resource=filesystem&maxResults=5000&directory=myfolder&timeout=90&recursive=false, AuthorizationFailure, "This request is not authorized to perform this operation.
I followed the steps described in this link
by granting access to me and my Synapse workspace the role of "Storage Blob Data Contributor" in the container or file system level:
Even that, I still get this persistent error. Am I missing other steps?
I got the same kind of error in my environment. I just followed this official document and done the repro, now it's working fine for me. You can follow the below code it will solve your problem.
Sample code:
from pyspark.sql import SparkSession
account_name = 'your_blob_name'
container_name = 'your_container_name'
relative_path = 'your_folder path'
linked_service_name = 'Your_linked_service_name'
sas_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_name)
Access to Blob Storage
path = 'wasbs://%s#%s.blob.core.windows.net/%s' % (container_name,account_name,relative_path)
spark.conf.set('fs.azure.sas.%s.%s.blob.core.windows.net' % (container_name,account_name),sas_token)
print('Remote blob path: ' + path)
Sample output:
Updated answer
Reference to configure Spark in pyspark notebook:
https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/notebook-this-request-is-not-authorized-to-perform-this/ba-p/1712566
My source is parquet files in ADLS gen2. All the parquet files are part files of size 10-14 MB. The total size should be around 80 GB
Sink is Azuresynapse table.
Copy method is Polybase. Getting below error within 5 sec of execution like below:
ErrorCode=PolybaseOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error happened when loading data into SQL Data Warehouse. Operation: 'Create external table'.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=External file access failed due to internal error: 'Error occurred while accessing HDFS: Java exception raised on call to HdfsBridge_IsDirExist. Java exception message:
HdfsBridge::isDirExist - Unexpected error encountered checking whether directory exists or not: AbfsRestOperationException: Operation failed: "This request is not authorized to perform this operation.", 403, HEAD, URL',Source=.Net SqlClient Data Provider,SqlErrorNumber=105019,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=105019,State=1,Message=External file access failed due to internal error: 'Error occurred while accessing HDFS: Java exception raised on call to HdfsBridge_IsDirExist. Java exception message:
HdfsBridge::isDirExist - Unexpected error encountered checking whether directory exists or not: AbfsRestOperationException: Operation failed: "This request is not authorized to perform this operation.", 403, HEAD,
I've seen this error due to failed authentication, check whether the authorization header and/or signature is wrong.
For example, create the scope credential using your ADLS Gen2 storage account access key:
CREATE DATABASE SCOPED CREDENTIAL [MyADLSGen2Cred] WITH
IDENTITY='user',
SECRET='zge . . . 8V/rw=='
The external data source is created as follows:
CREATE EXTERNAL DATA SOURCE [MyADLSGen2] WITH (
TYPE=HADOOP,
LOCATION='abfs://myblob#pabechevb.dfs.core.windows.net',
CREDENTIAL=[MyADLSGen2Cred])
You can specify wasb instead of abfs, and if you're using SSL, specify it as abfss. Then the external table is created as follows:
CREATE EXTERNAL TABLE [dbo].[ADLSGen2] (
[Content] varchar(128))
WITH (
LOCATION='/',
DATA_SOURCE=[MyADLSGen2],
FILE_FORMAT=[TextFileFormat])
You can find additional information in my book "Hands-On Data Virtualization with Polybase".
I am trying to run google cloud python sdk from inside a k8 pod, running on google compute engine. There is a service account attached to the VM, which is giving it access to the secrets manager. I am able to access secrets manager from the host, however running the python sdk from k8 pod complains of not able to access the metadata service
>>> secret_id = 'unskript_test'
>>> name = client.secret_path(project_id, secret_id)
>>> response = client.get_secret(request={"name": name})
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 67, in error_remapped_callable
return callable_(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/opt/conda/lib/python3.7/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Getting metadata from plugin failed with error: Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Compute Engine Metadata server unavailable"
debug_error_string = "{"created":"#1630634901.103779641","description":"Getting metadata from plugin failed with error: Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Compute Engine Metadata server unavailable","file":"src/core/lib/security/credentials/plugin/plugin_credentials.cc","file_line":90,"grpc_status":14}"
>
metadata.google.internal doesnt get resolved from the k8 pod
jovyan#jovyan-25ca6c8c-157d-49e5-9366-f9d57fcb7a9f:~$ wget http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true
--2021-09-03 02:11:19-- http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true
Resolving metadata.google.internal (metadata.google.internal)... failed: Name or service not known.
wget: unable to resolve host address ‘metadata.google.internal’
However, host is able to resolve it
ubuntu#gcp-test-proxy:~$ wget http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true
--2021-09-03 02:11:27-- http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true
Resolving metadata.google.internal (metadata.google.internal)... 169.254.169.254
Connecting to metadata.google.internal (metadata.google.internal)|169.254.169.254|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2021-09-03 02:11:27 ERROR 403: Forbidden.
How can i make the pod resolve metadata.google.internal?
i have problem with swift..when i execute swift -V 2.0 -A http://xxx.xxx.x.xx:5000/v2.0/ -U cookbook:demo -K openstack stat
and then this is output
Auth GET failed: http://xxx.xxx.x.xx:5000/v2.0/tokens 500 Internal Server Error
any solution for me? :)
I hit this error while execute 'swift list'.
Error: Account GET failed ... 503 Internal Server Error (first 60 chars of response)...
On swift storage node, check log '/var/log/swift/account-server.log', and get a piece of error message:[Errno 13] Permission denied '/srv/node/sdb1/accounts'
According to the error message, I found the root cause is that, on swift storage node, the swift user doesn't have permission on directory '/srv/node/'. Grant permission with CMD: chown -R swift:swift /srv/node
And the problem is solved. Hope this helpful.