AzCopy - how to specify metadata when copying a file to a blob storage

AzCopy - how to specify metadata when copying a file to a blob storage - metadata

I'm trying to upload a file to an Azure Blob storage using AzCopy, but I want to include metadata.
According to the documentation, "AzCopy copy" has a metadata parameter where I have to provide key/value pairs as a string.
How has this string to be formatted? I can't get it to work and don't find any examples...
AzCopy.exe copy .\testfile2.txt "https://storageaccount.blob.core.windows.net/upload/testfile4.txt?sastoken" --metadata ?what_here?
Thanks!
Documentation:
https://learn.microsoft.com/en-us/azure/storage/common/storage-ref-azcopy-copy#options

The string should be in this format: --metadata "name=ivan".
If you want to add multi metadata, use this format: --metadata "name=ivan;city=tokyo"
This is the command I'm using, and the version of azcopy is 10.3.4 :
azcopy copy "file_path" "https://xxx.blob.core.windows.net/test1/aaa1.txt?sasToken" --metadata "name=ivan"
The test result:

Related

How to change the metadata of all specific file of exist objects in Google Cloud Storage?

I have uploaded thousands of files to google storage, and i found out all the files miss content-type,so that my website cannot get it right.
i wonder if i can set some kind of policy like changing all the files content-type at the same time, for example, i have bunch of .html files inside the bucket
a/b/index.html
a/c/a.html
a/c/a/b.html
a/a.html
.
.
.
is that possible to set the content-type of all the .html files with one command in the different place?

You could do:
gsutil -m setmeta -h Content-Type:text/html gs://your-bucket/**.html

There's no a unique command to achieve the behavior you are looking for (one command to edit all the object's metadata) however, there's a command from gcloud to edit the metadata which you could use on a bash script to make a loop through all the objects inside the bucket.
1.- Option (1) is to use a the gcloud command "setmeta" on a bash script:
# kinda pseudo code here.
# get the list with all your object's names and iterate over the metadata edition command.
for OUTPUT in $(get_list_of_objects_names)
do
gsutil setmeta -h "[METADATA_KEY]:[METADATA_VALUE]" gs://[BUCKET_NAME]/[OBJECT_NAME]
# the "gs://[BUCKET_NAME]/[OBJECT_NAME]" would be your object name.
done
2.- You could also create a C++ script to achieve the same thing:
namespace gcs = google::cloud::storage;
using ::google::cloud::StatusOr;
[](gcs::Client client, std::string bucket_name, std::string object_name,
std::string key, std::string value) {
# you would need to find list all the objects, while on the loop, you can edit the metadata of the object.
for (auto&& object_metadata : client.ListObjects(bucket_name)) {
string bucket_name=object_metadata->bucket(), object_name=object_metadata->name();
StatusOr<gcs::ObjectMetadata> object_metadata =
client.GetObjectMetadata(bucket_name, object_name);
gcs::ObjectMetadata desired = *object_metadata;
desired.mutable_metadata().emplace(key, value);
StatusOr<gcs::ObjectMetadata> updated =
client.UpdateObject(bucket_name, object_name, desired,
gcs::Generation(object_metadata->generation()))
}
}

ERROR: The specifed resource name contains invalid characters. ErrorCode: InvalidResourceName

ERROR: The specifed resource name contains invalid characters. ErrorCode: InvalidResourceName
2019-10-31T10:28:17.4678189Z <?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidResourceName</Code><Message>The specifed resource name contains invalid characters.
2019-10-31T10:28:17.4678695Z RequestId:
2019-10-31T10:28:17.4679207Z Time:2019-10-31T10:28:17.4598301Z</Message></Error>
I am trying to deploy my static website to blob storage in azure with azure DevOps, but I am getting this error. In my pipeline, I am using grunt build to build, and archive it to zip, then publishing to the azure pipeline, then in the release, I am extracting files, and trying to upload these files with azure CLI task.
I am using following command
az storage blob upload-batch --account-name something --account-key something --destination ‘$web’ --source ./
My Container name is $web

Permitted characters are lowercase a-z 0-9 and single infix hyphens
[a-z0-9\-]
https://learn.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata

I solved this problem by removing apostrophes around container name:
az storage blob upload-batch --account-name something --account-key something --destination $web --source ./

This will probably not solve your problem, but it will solve a related problem for other people:
If the aim is to simply download a file from Azure File Storage using a link, after generating a SAS token, as shown here: Azure File Storage URL in browser showing InvalidHeaderValue
If you remove the slash after the name of the file in the generated link, the file will download!

verify container name or coonection string might contain extra/nonallowed symbols... In my case it was having extra spaces in container name

Error while loading parquet format file into Amazon Redshift using copy command and manifest file

I'm trying to load parquet file using manifest file and getting below error.
query: 124138ailed due to an internal error. File 'https://s3.amazonaws.com/sbredshift-east/data/000002_0 has an invalid version number: )
Here is my copy command
copy testtable from 's3://sbredshift-east/manifest/supplier.manifest'
IAM_ROLE 'arn:aws:iam::123456789:role/MyRedshiftRole123'
FORMAT AS PARQUET
manifest;
here is my manifest file
**{
"entries":[
{
"url":"s3://sbredshift-east/data/000002_0",
"mandatory":true,
"meta":{
"content_length":1000
}
}
]
}**
I'm able to load the same file using copy command by specifying the file name.
copy testtable from 's3://sbredshift-east/data/000002_0' IAM_ROLE 'arn:aws:iam::123456789:role/MyRedshiftRole123' FORMAT AS PARQUET;
INFO: Load into table 'supplier' completed, 800000 record(s) loaded successfully.
COPY
What could be wrong in my copy statement?

This error happens when the content_length value is wrong. You have to specify the correct content_length. You could check it executing an s3 ls command.
aws s3 ls s3://sbredshift-east/data/
2019-12-27 11:15:19 539 sbredshift-east/data/000002_0
The 539 (file size) should be the same than the content_lenght value in your manifest file.
I don't know why they are using this meta value when you don't need it in the direct copy command.
¯\_(ツ)_/¯

The only way I've gotten parquet copy to work with manifest file is to add the meta key with the content_length.
From what I can gather in my error logs, the COPY command for parquet (w/ manifest) might first be reading the files using Redshift Spectrum as an external table. If that's the case, this hidden step does require the content_step which contradicts their initial statement about COPY commands.
https://docs.amazonaws.cn/en_us/redshift/latest/dg/loading-data-files-using-manifest.html

Deleting all blobs inside a path prefix using google cloud storage API

I am using google cloud storage python API. I came across a situation where I need to delete a folder that might have hundred of files using API. Is there an efficient way to do it without making recursive and multiple delete call?
One solution that I have is to list all blob objects in the bucket with given path prefix and delete them one by one.
The other solution is to use gsutil:
$ gsutil rm -R gs://bucket/path

Try something like this:
bucket = storage.Client().bucket(bucket_name)
blobs = bucket.list_blobs()
while True:
blob = blobs.next()
if not blob: break
if blob.name.startswith('/path'): blob.delete()
And if you want to delete the contents of a bucket instead of a folder within a bucket you can do it in a single method call as such:
bucket = storage.Client().bucket(bucket_name)
bucket.delete_blobs(bucket.list_blobs())

from google.cloud import storage
def deleteStorageFolder(bucketName, folder):
"""
This function deletes from GCP Storage
:param bucketName: The bucket name in which the file is to be placed
:param folder: Folder name to be deleted
:return: returns nothing
"""
cloudStorageClient = storage.Client()
bucket = cloudStorageClient.bucket(bucketName)
try:
bucket.delete_blobs(blobs=bucket.list_blobs(prefix=folder))
except Exception as e:
print str(e.message)
In this case folder = "path"

Using Filepicker.io, uploading files into a folder in an S3 Bucket

Can I upload files into a specific folder in an S3 bucket rather than just uploading into the base folder of the bucket?

Yes, use the "path" parameter on the filepicker.store call.
filepicker.store(fpfile, {location:'S3', path:'myfolder/file.png'},
function(stored_fpfile){
console.log(stored_fpfile);
});
Documentation at https://developers.filepicker.io/docs/web/#store

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

AzCopy - how to specify metadata when copying a file to a blob storage - metadata

Related

How to change the metadata of all specific file of exist objects in Google Cloud Storage?

ERROR: The specifed resource name contains invalid characters. ErrorCode: InvalidResourceName

Error while loading parquet format file into Amazon Redshift using copy command and manifest file

Deleting all blobs inside a path prefix using google cloud storage API

Using Filepicker.io, uploading files into a folder in an S3 Bucket

Categories

Resources