CEPH S3 Exception while Listing Blobs - kubernetes

I have created an S3 bucket backed by CEPH and through java S3 client and via S3 object gateway am listing the directory in a paginated fashion and randomly the listing is failing some times after listing 1100 blobs in batches, some times after listing 2000 blobs in batches and am not able to figure out how to debug this issue, this is the exception am getting and if you notice there is a requestId in the exception I think basis this I can filter the logs but where can i find the logs is the question, I have checked the s3 gateway pod logs but I couldn't find any such logs over there, please let me know where should I look for the same
com.amazonaws.services.s3.model.AmazonS3Exception: null (Service: Amazon S3; Status Code: 500; Error Code: UnknownError; Request ID: tx00000000000000000e7df-005e626049-1146-rook-ceph-store; S3 Extended Request ID: 1146-rook-ceph-store-rook-ceph-store), S3 Extended Request ID: 1146-rook-ceph-store-rook-ceph-store
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1799)
and this is my code to iterate through the blobs, this is non paginated, the paginated version, both the versions are throwing the same exception after listing few hundred blobs
ObjectListing objects = conn.listObjects(bucket.getName());
do {
for (S3ObjectSummary objectSummary : objects.getObjectSummaries()) {
System.out.println(objectSummary.getKey() + "\t" +
objectSummary.getSize() + "\t" +
StringUtils.fromDate(objectSummary.getLastModified()));
}
objects = conn.listNextBatchOfObjects(objects);
} while (objects.isTruncated());
So, any pointers on how to debug this would be helpful.. Thanks

Try ListObjectV2.
Returns some or all (up to 1,000) of the objects in a bucket.
https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html

Related

Getting an error while using copy activity (polybase) in adf to copy parquet files in ADLS gen2 to Azure synapse table

My source is parquet files in ADLS gen2. All the parquet files are part files of size 10-14 MB. The total size should be around 80 GB
Sink is Azuresynapse table.
Copy method is Polybase. Getting below error within 5 sec of execution like below:
ErrorCode=PolybaseOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error happened when loading data into SQL Data Warehouse. Operation: 'Create external table'.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=External file access failed due to internal error: 'Error occurred while accessing HDFS: Java exception raised on call to HdfsBridge_IsDirExist. Java exception message:
HdfsBridge::isDirExist - Unexpected error encountered checking whether directory exists or not: AbfsRestOperationException: Operation failed: "This request is not authorized to perform this operation.", 403, HEAD, URL',Source=.Net SqlClient Data Provider,SqlErrorNumber=105019,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=105019,State=1,Message=External file access failed due to internal error: 'Error occurred while accessing HDFS: Java exception raised on call to HdfsBridge_IsDirExist. Java exception message:
HdfsBridge::isDirExist - Unexpected error encountered checking whether directory exists or not: AbfsRestOperationException: Operation failed: "This request is not authorized to perform this operation.", 403, HEAD,
I've seen this error due to failed authentication, check whether the authorization header and/or signature is wrong.
For example, create the scope credential using your ADLS Gen2 storage account access key:
CREATE DATABASE SCOPED CREDENTIAL [MyADLSGen2Cred] WITH
IDENTITY='user',
SECRET='zge . . . 8V/rw=='
The external data source is created as follows:
CREATE EXTERNAL DATA SOURCE [MyADLSGen2] WITH (
TYPE=HADOOP,
LOCATION='abfs://myblob#pabechevb.dfs.core.windows.net',
CREDENTIAL=[MyADLSGen2Cred])
You can specify wasb instead of abfs, and if you're using SSL, specify it as abfss. Then the external table is created as follows:
CREATE EXTERNAL TABLE [dbo].[ADLSGen2] (
[Content] varchar(128))
WITH (
LOCATION='/',
DATA_SOURCE=[MyADLSGen2],
FILE_FORMAT=[TextFileFormat])
You can find additional information in my book "Hands-On Data Virtualization with Polybase".

How to create a bucket using the python SDK?

I'm trying to create a bucket in cloud object storage using python. I have followed the instructions in the API docs.
This is the code I'm using
COS_ENDPOINT = "https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints"
# Create client
cos = ibm_boto3.client("s3",
ibm_api_key_id=COS_API_KEY_ID,
ibm_service_instance_id=COS_INSTANCE_CRN,
config=Config(signature_version="oauth"),
endpoint_url=COS_ENDPOINT
)
s3 = ibm_boto3.resource('s3')
def create_bucket(bucket_name):
print("Creating new bucket: {0}".format(bucket_name))
s3.Bucket(bucket_name).create()
return
bucket_name = 'test_bucket_442332'
create_bucket(bucket_name)
I'm getting this error - I tried setting CreateBucketConfiguration={"LocationConstraint":"us-south"}, but it doesnt seem to work
"ClientError: An error occurred (IllegalLocationConstraintException) when calling the CreateBucket operation: The unspecified location constraint is incompatible for the region specific endpoint this request was sent to."
Resolved by going to https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-endpoints#endpoints
And choosing the endpoint specific to the region I need. The "Endpoint" provided with the credentials, is not the actual endpoint.

ACL verification fails for Data Factory Service Principal, although it has rwx permissions

I have a U-SQL script that executes successfully from VS Code with my personal credentials, but fails when triggered from a Data Factory pipeline. My personal account has Owner rights on the Azure subscription. ADF uses Service Principal authentication with Data Lake Analytics & Store.
I am using Data Factory V2 and Data Lake Gen1 with the Default Integration Runtime. ADLA Firewall is disabled.
The U-SQL script is very simple, it just reads data from a CSV file and tries to write it in another CSV file. This is the whole script:
#companies =
EXTRACT
Id string,
Name string
FROM #InputFile
USING Extractors.Csv(skipFirstNRows: 1);
OUTPUT #companies
TO #OutputFile
USING Outputters.Csv(outputHeader: true);
The parameters InputFile and OutputFile contain the ADL paths to the input and output data. These parameters are passed from Data Factory. The first stage of the script ("Extract") executes successfully, and the graph shows that the error occurs in the "PodAggregate" stage. A similar error occurs if I try to write the output to a managed table instead of a CSV file.
The high level error message in Data Factory is:
Error Id: VertexFailedFast, Error Message: Vertex failed with a fail-fast error.
Data Lake Analytics gives the more detailed error:
E_STORE_USER_ERROR: A user error has been reported when reading or writing data.
Component: STORE
Description: Operation 'Open::Wait' returned error code -2096559454 'Forbidden. ACL verification failed. Either the resource does not exist or the user is not authorized to perform the requested operation.' for stream 'adl://[myadl].azuredatalakestore.net/adla/tmp/8a1495dc-8d80-44b9-a724-f2a0a963b3c8/stack_test/Companies.csv---6F21F973-45B9-46C7-805F-192672C99393-9_0_1.dtf%23N'.
Details:
Stream name 'adl://[myadl].azuredatalakestore.net/adla/tmp/8a1495dc-8d80-44b9-a724-f2a0a963b3c8/stack_test/Companies.csv---6F21F973-45B9-46C7-805F-192672C99393-9_0_1.dtf%23N'
Thu Jul 19 02:35:42 2018: Store user error, Operation:[Open::Wait], ErrorEither the resource does not exist or the current user is not authorized to perform the requested operation
7ffd8c4195b7 ScopeEngine!?ToStringInternal#KeySampleCollection#SSLibV3#ScopeEngine##AEAA?AV?$basic_string#DU?$char_traits#D#std##V?$allocator#D#2##std##XZ + 11b7
7ffd8c39a96d ScopeEngine!??0ExceptionWithStack#ScopeEngine##QEAA#W4ErrorNumber#1#AEBV?$initializer_list#VScopeErrorArg#ScopeCommon###std##_N#Z + 13d
7ffd8c3abe3e ScopeEngine!??0DeviceException#ScopeEngine##QEAA#AEAVBlockDevice#1#AEBV?$basic_string#DU?$char_traits#D#std##V?$allocator#D#2##std##J#Z + 1de
7ffd8c3f8c7b ScopeEngine!?GetTotalIoWaitTime#Statistics#Scanner#ScopeEngine##QEAA_JXZ + 133b
7ffd8c3f87dc ScopeEngine!?GetTotalIoWaitTime#Statistics#Scanner#ScopeEngine##QEAA_JXZ + e9c
7ffd9157780d ScopeCodeGenEngine!ScopeEngine::CosmosOutput::IssueWritePage + 4d d:\data\yarnnm\local\usercache\f675bad0-3d48-4f08-9933-d7cb614ec7a8\appcache\application_1531519980045_88416\container_e194_1531519980045_88416_01_000001\wd\scopeio.h line:6063
7ffd9156be7d ScopeCodeGenEngine!ScopeEngine::TextOutputStream,ScopeEngine::CosmosOutput>::Write + 2bd d:\data\yarnnm\local\usercache\f675bad0-3d48-4f08-9933-d7cb614ec7a8\appcache\application_1531519980045_88416\container_e194_1531519980045_88416_01_000001\wd\scopeio.h line:7828
7ffd91579290 ScopeCodeGenEngine!ScopeEngine::TextOutputPolicy::SerializeHeader + 30 d:\data\yarnnm\local\usercache\f675bad0-3d48-4f08-9933-d7cb614ec7a8\appcache\application_1531519980045_88416\container_e194_1531519980045_88416_01_000001\wd__scopecodegenengine__.dll.cpp line:514
7ffd91574ba7 ScopeCodeGenEngine!ScopeEngine::Outputer,ScopeEngine::BinaryInputStream,ScopeEngine::ExecutionStats>,SV1_Extract_out0,ScopeEngine::ScopeUnionAll,ScopeEngine::BinaryInputStream,ScopeEngine::ExecutionStats>,SV1_Extract_out0>,3>,SV1_Extract_out0,1>,SV1_Extract_out0,ScopeEngine::TextOutputPolicy,ScopeEngine::TextOutputStream,ScopeEngine::CosmosOutput>,0,ScopeEngine::ExecutionStats,ScopeEngine::DummyStatsWriter>::DoOutput + 27 d:\data\yarnnm\local\usercache\f675bad0-3d48-4f08-9933-d7cb614ec7a8\appcache\application_1531519980045_88416\container_e194_1531519980045_88416_01_000001\wd\scopeoperators.h line:5713
7ffd91582258 ScopeCodeGenEngine!SV2_PodAggregate_execute + 658 d:\data\yarnnm\local\usercache\f675bad0-3d48-4f08-9933-d7cb614ec7a8\appcache\application_1531519980045_88416\container_e194_1531519980045_88416_01_000001\wd__scopecodegenengine__.dll.cpp line:722
7ffd8c36571d ScopeEngine!??1OutputFileInfo#ScopeEngine##QEAA#XZ + 60d
7ffd8c397aa0 ScopeEngine!?RunUserCode#Vertex#ScopeEngine##SA_N_NAEBV?$function#$$A6AXXZ#std###Z + 1b0
7ffd8c397a4e ScopeEngine!?RunUserCode#Vertex#ScopeEngine##SA_N_NAEBV?$function#$$A6AXXZ#std###Z + 15e
7ffd8c397915 ScopeEngine!?RunUserCode#Vertex#ScopeEngine##SA_N_NAEBV?$function#$$A6AXXZ#std###Z + 25
7ffd8c365c7f ScopeEngine!??1OutputFileInfo#ScopeEngine##QEAA#XZ + b6f
7ffd8c3950c4 ScopeEngine!?Execute#Vertex#ScopeEngine##SA_NAEBVVertexStartupInfo#2#PEAUVertexExecutionInfo#2##Z + 3f4
7ff731d8ae8d scopehost!(no name)
7ff731d8adbd scopehost!(no name)
7ffd8c4274d9 ScopeEngine!?Execute#VertexHostBase#ScopeEngine##IEAA_NAEAVVertexStartupInfo#2##Z + 379
7ff731d8d236 scopehost!(no name)
7ff731d6a966 scopehost!(no name)
7ff731d98dac scopehost!(no name)
7ffd9e4713d2 KERNEL32!BaseThreadInitThunk + 22
7ffd9e5e54e4 ntdll!RtlUserThreadStart + 34
The Service Principal account has Owner permissions on Data Lake Analytics. The SP account also has (default) rwx permissions on the /stack_test subdirectory in Data Lake Store and all its files and children, and x permission on the root directory. The error message seems to say that the SP account is missing permissions on the destination file (/stack_test/Companies.csv), but I can explicitly see that it has rwx on that file. Which permissions am I still missing?
For reference, the script and the Data Factory resources necessary to reproduce this problem can be found at: https://github.com/lehmus/StackQuestions/tree/master/ADF_ADLA_Auth.

IBM Cloud Object Storage error creating bucket - 'creation failed, vault name invalid.'

I'm trying to create two buckets in IBM Cloud Object Storage:
cos = ibm_boto3.resource('s3',
ibm_api_key_id=cos_credentials['apikey'],
ibm_service_instance_id=cos_credentials['resource_instance_id'],
ibm_auth_endpoint=auth_endpoint,
config=Config(signature_version='oauth'),
endpoint_url=service_endpoint)
import datetime
# valid bucket format is ^[a-zA-Z0-9.\-_]{1,255}$
bucket_uid = datetime.datetime.now().isoformat().replace(':', '')
buckets = ['training-data-' + bucket_uid, 'training-results-' + bucket_uid]
for bucket in buckets:
if not cos.Bucket(bucket) in cos.buckets.all():
print('Creating bucket "{}"...'.format(bucket))
try:
cos.create_bucket(Bucket=bucket)
except ibm_boto3.exceptions.ibm_botocore.client.ClientError as e:
print('Error: {}.'.format(e.response['Error']['Message']))
Returns the error:
Creating bucket "training-data-2018-07-11T090425.347277"...
Error: Container training-data-2018-07-11T090425.347277 creation failed, vault name invalid.
Creating bucket "training-results-2018-07-11T090425.347277"...
Error: Container training-results-2018-07-11T090425.347277 creation failed, vault name invalid.
What is the issue here? Why is the vault name invalid?
If I change the bucket_id to '12345', the buckets are created ok.
The bucket name length was the issue.
Shortening the timestamp was the answer for me. It's a shame that the error message wasn't a bit more informative.

IBM COS - can't get or create bucket with boto

from boto.s3.connection import S3Connection
conn = S3Connection('****', '****', host='s3.eu-geo.objectstorage.softlayer.net')
mybucket = conn.get_bucket('mybucket')
Returns
/anaconda3/lib/python3.6/site-packages/boto/s3/connection.py in head_bucket(self, bucket_name, headers)
551 err.error_code = 'NoSuchBucket'
552 err.error_message = 'The specified bucket does not exist'
--> 553 raise err
554 else:
555 raise self.provider.storage_response_error(
S3ResponseError: S3ResponseError: 404 Not Found
However, if I try to create the bucket:
conn.create_bucket('mybucket')
S3CreateError: S3CreateError: 409 Conflict
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Error><Code>BucketAlreadyExists</Code><Message>Container mybucket exists</Message><Resource>/mybucket/</Resource><RequestId>****</RequestId><httpStatusCode>409</httpStatusCode></Error>
Bucket names are universally unique in IBM Cloud object storage (or AWS S3). Deletion of a bucket is eventually consistent, and takes 10 mins to propagate, which means that a bucket name for a recently deleted bucket (by you or any other user) will take some time to be available again. The error you mentioned shows that the bucket 'mybucket' has been recently deleted, and is in a period where the name is not available. Generally it is suggested to use some specific prefix before bucket names, if it is something common (like mybucket). Check out this excerpt from the IBM COS API docs:
A DELETE issued to an empty bucket deletes the bucket. After deleting a bucket the name will be held in reserve by the system for 10 minutes, after which it will be released for re-use. Only empty buckets can be deleted. This operation does not make use of operation specific headers, query parameters, or payload elements.